Newsroom



STATISTICS & DATA SCIENCE NEWSROOM
Hosted by Chong Ho (Alex) Yu,   SCASA President-Elect

Posted on April 19, 2024

This week Microsoft unveiled a $1.5 billion investment in G42, an AI firm based in the United Arab Emirates (UAE). While this sum may appear modest compared to other AI investments, it carries significant geopolitical implications. Amid growing U.S. apprehensions regarding Middle Eastern countries deepening their connections with China, this investment can be seen as a reaffirmation of ties between the U.S. and the Arab world. Beyond this financial commitment, the deal also includes support for a new $1 billion developer fund aimed at cultivating an AI workforce and fostering innovation within the region.

That’s my take on it:

This deal might have another geopolitical implication. On April 13 Iran launched a massive attack on Israel, and as a result Israel retaliated yesterday. Afterwards, there is a growing fear of regional conflict escalation. However, although the deal was negotiated between the two parties long time ago, the announcement of Microsoft's investment in the UAE at this moment sends a reassuring signal to the region, suggesting that major tech players maintain confidence in its stability despite the volatile geopolitical landscape.

Link: https://news.crunchbase.com/ai/venture-tech-startups-msft-g42-uae/

Posted on April 19, 2024

Yesterday (April 18), Meta made headlines by introducing Llama 3, its latest AI model designed to rival offerings from Microsoft and Google. This updated version boosts enhanced reasoning abilities, aiming to emulate human-like intelligence. Meta asserts that its Meta AI “AI is now the most intelligent AI assistant people can use for free”, with expanded availability across numerous countries. Beyond consumer applications, Llama 3 models are slated to soon integrate with various platforms, including AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms provided by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.  Additionally, Meta touts accelerated image generation capabilities, showcasing the ability to generate images in real-time.

That’s my take on it:

In order to examine Meta AI's claim of superiority, I posed identical queries to Microsoft Copilot, Perplexity AI, and Meta AI: “how many Arab countries have diplomatic ties with Iran?”. While the former two provided detailed and current accounts, Meta AI's answer is vague and brief: “The United Arab Emirates maintains diplomatic relations with Iran, despite relations between the neighboring countries being shaky and unpredictable ¹. The dynamic between the Arab League and the Islamic Republic of Iran has been ambivalent, due to Iran's varying bilateral conduct with each country of the Arab League.”

Another inquiry regarding Arab countries' diplomatic ties with Israel revealed disparities in responses. Meta AI listed Egypt, Jordan, UAE, Bahrain, and Morocco. But Microsoft Copilot, Perplexity AI mentioned that Sudan accepted Abraham Accords, but has not finalized a treaty with Israel yet. Such discrepancies cast doubt on Meta AI's status as the most intelligent AI assistant.

Further examination of Meta AI's image generation feature using a specific prompt yielded underwhelming results: “A female hula dancer underwater, sun rays from the water above, photorealistic, tack sharp focus.”  The attached JPEG image is by no means tack sharp, failing to meet expectations. Despite this, Meta AI distinguishes itself with the ability to generate animated images, as evidenced by the attached MP4 clip.

Link: https://ai.meta.com/blog/meta-llama-3/

Posted on April 17, 2024

Today Boston Dynamics announced the retirement of the hydraulic version of its humanoid robot Atlas, which was introduced in 2013. Its successor, the all-electric Atlas,

is not only lighter but also more capable. While its predecessor could mimic human movements, the electric iteration boasts 360-degree mobility for its limbs and head.

Prioritizing task-specific mobility and manipulation, the new Atlas exceeds human capabilities, particularly in tasks considered dull, dirty, and dangerous.

That’s my take on it:

According to Tech Radar, despite being powered by the latest AI technologies and resembling humans more closely, the new Atlas still has a long way to go before being

commercially feasible. Boston Dynamics plans to initially test it with company investor Hyundai to explore applications for consumers. Nevertheless, given the rapid pace of AI

and robotic advancements, I am optimistic. I firmly believe that widespread adoption of AI-powered robotics will occur within my lifetime. One prospective application could involve

deploying robotic troops to conflict zones for precise targeting, thus minimizing civilian casualties.

https://www.youtube.com/watch?v=29ECwExc-_M

Link to YouTube movie comparing New Atlas Electric and Tesla Optimus Gen 2:

https://www.youtube.com/watch?v=9Bj77zcY-HM

Posted on April 15, 2024

As you may already know, artificial neural networks (ANNs) have been at the forefront of AI research for almost two decades, drawing inspiration from cognitive psychology and neuroscience. Brain-inspired computing, acknowledged by the International Semiconductor Association as one of the most promising disruptive technologies post-Moore's Law, has gained more and more recognition. A recent comprehensive review on Hybrid Neural Networks (HNNs), conducted by researchers from Tsinghua University, China, was published in the National Science Review. HNN seamlessly integrates Spiking Neural Networks (SNNs) rooted in neuroscience and Artificial Neural Networks (ANNs) based on computer science. Leveraging the distinct advantages of these heterogeneous networks in information representation and processing, HNN infuses fresh vigor into Artificial General Intelligence (AGI) development.

That’s my take on it:

The potential of HNN as a groundbreaking advancement in AI research awaits further exploration. Nevertheless, China is rapidly narrowing the gap, challenging American dominance in AI research. Data science and AI are inherently interdisciplinary, particularly in the fusion of neuroscience and AI. Despite this, many US universities still maintain a siloed approach. Psychology students often lack exposure to AI, while data science students receive limited formal education in cognitive psychology and neuroscience. To adequately prepare our future AI researchers, it may be time to restructure our curriculum.

Link:

https://techxplore.com/news/2024-04-advancing-brain-hybrid-neural-networks.amp?utm_source=ONTRAPORT-email-broadcast&utm_medium=ONTRAPORT-email-broadcast&utm_term=Newsletter&utm_content=Data+Science+Insider%3A+April+12th%2C+2024&utm_campaign=13042024

Posted on April 10, 2024

Today I attended a panel discussion titled "Now or Never for AI Policy?" organized by Project Syndicate. There is a consensus among the panelists that AI is so powerful that the consequence of misuse and mistakes of AI can be disastrous. Gary Marcus, Emeritus Professor of Psychology and Neural Science at New York University, drew a comparison between AI and Hydra, the nine-headed monster in Greek mythology. He emphasized the need for rigorous analysis of benefits and harms before the release of AI technologies, similar to the scrutiny applied in the field of medicine regarding drug approvals.

Abeba Birhane, Senior Fellow in Trustworthy AI at the Mozilla Foundation, asserted that she could not see any positive applications of AI-enabled voice cloning. Rather, this technology can be used for impersonation and scamming. The panel collectively emphasized trust as a critical issue in the realm of AI. To emphasize this point, the host referencing a quote from Margarethe Vestager, Executive Vice President of the European Commissions, “AI will not reach its immerse positive potential unless end-users trust it. Here, even more than in many other fields, trust serves as an engine of innovation.

That’s my take on it:

While acknowledging the risks associated with AI misuse and the necessity for regulations, I believe there are numerous promising applications for voice cloning technology. For instance, I am currently collaborating with an online education company to produce video lectures. AI voice cloning can significantly reduce production costs and streamline updates. Additionally, according to ID R&D, voice cloning of historical figures opens avenues for interactive teaching and dynamic storytelling. With AI voice cloning software, celebrity voices can narrate books, authors can read their autobiographies, and historical figures can recount their stories in their own voices. Moreover, voice cloning offers opportunities for those who have lost loved ones to interact with recreations of their voices. The potential benefits of AI voice cloning are vast and even limitless.

Link: https://www.youtube.com/@projectsyndicate

Posted on March 29, 2024

In a recent paper published in the Proceedings of the National Academy of Sciences (PNAS), a team of researchers has unveiled several intriguing discoveries regarding neural networks. Regardless of their architectural design or scale, neural networks demonstrate a consistent, low-dimensional trajectory in learning to classify images. Through extensive experimentation involving a diverse array of network types, including multi-layer perceptrons, convolutional and residual networks, as well as transformers like those utilized in systems such as ChatGPT, the researchers observed a convergence in the learning paths of these networks, indicating a shared approach to image classification. The findings of this study suggest the potential for developing highly efficient AI training algorithms that demand fewer computational resources. Employing techniques rooted in information geometry, the researchers were able to conduct a comparative analysis of different networks, revealing fundamental similarities in their learning methodologies.

That’s my take on it:

The implications of this research are profound: it may pave the way for training neural networks at a reduced cost. Such advancements could empower data scientists to harness AI technologies more efficiently in addressing complex real-world challenges. Moreover, the alignment between artificial neural networks and their biological counterparts offers insights into the intersection of neuroscience and AI, underscoring their interdependent nature since the inception of AI research. This underscores the potential for interdisciplinary collaboration, suggesting that university courses on this subject could be jointly pursued by students majoring in data science and psychology alike.

Full paper: https://www.pnas.org/doi/10.1073/pnas.2310002121

Posted on March 27, 2024

OpenAI recently provided artists with access to Sora for experimental purposes. Utilizing this innovative text-to-video generative AI tool, a studio known as Shy Kids produced a captivating video titled "Air Head," portraying the life of a man with a balloon as his head. Sora's remarkable capability to seamlessly integrate the whimsical balloon head with a seemingly human body and lifelike surroundings is truly impressive. Another noteworthy creation is the video "Beyond Our Reality" by digital artist Don Allen Stevenson III. This piece resembles a surreal nature documentary, showcasing unprecedented animal hybrids such as the Girafflamingo, flying pigs, and the Eel Cat. Each creature appears as though created by a mad scientist, meticulously melding disparate animal features to form these fantastical chimeras.

 

That’s my take of it:

The current duration of most demo videos ranges from mere seconds to a few minutes. The resource requirements for rendering longer videos, spanning 30 minutes to an hour, remain uncertain, though it's conceivable that a significant array of GPUs will be necessary. Undoubtedly, as this technology advances, it will become both more sophisticated and more cost-effective. Historically, Hollywood has embraced cutting-edge technology in filmmaking, and it's foreseeable that this advancement will eventually render conventional CGI techniques obsolete. Consequently, visual effects artists may need to adapt or face displacement, prompting a need for upskilling. However, the viability of smaller studios focused on marketing and advertising in the long term is uncertain, given the transformative nature of this technological shift.

 

Links:

https://www.techradar.com/computing/artificial-intelligence/openai-just-gave-artists-access-to-sora-and-proved-the-ai-video-tool-is-weirder-and-more-powerful-than-we-thought

 

https://www.youtube.com/watch?v=IS0xphCc5rI

Posted on March 20, 2024

DSML trend: Elsevier journal fails to detect a paper written by ChatGPT

A recent research paper from four scholars in China, published in an Elsevier journal, has attracted widespread attention online due to its opening sentence: “Certainly, here is a possible introduction for your topic.” This line, identified by an academic investigator, suggests the involvement of ChatGPT in the paper's creation, given the phrase's resemblance to typical AI-generated content starters. The investigator questioned how such an obvious sign of fraud could bypass the scrutiny of coauthors, editors, referees, copy editors, and typesetters. Further scrutiny from the academic community has revealed additional issues, including identical data and graphs in different papers are recycled by the authors, despite claims of presenting new instances. On March 12 the publisher replied, “our policies are clear that LLMs can be used in the drafting of the papers as long as it is declared by the authors on submission. We are investigating this paper and are in discussion with Editorial Team and authors.”

That’s my take on it:

While the aforementioned paper has come under the spotlight, I believe there are numerous other similar papers that remain undetected. If the authors had omitted the initial sentence resembling chatbot output, it is likely that the paper would have escaped the scrutiny of academic detectives as thousands of academic papers are published every week. Some of my students were using the internet to plagiarize before AI tools became popular. Nevertheless, I was able to identify them without the assistance of Turnitin or SafeAssign, because those students copied everything, even the blue-underlined hyperlinks and the pronoun "we," even though the assignment was meant to be written by the student alone. However, when these obvious errors are exposed, I worry that academic fraud will become increasingly difficult to identify. Hence, it is imperative to offer classes and workshops on AI ethics.

Original paper:

https://www.sciencedirect.com/science/article/pii/S2468023024002402

Link to investigation and discussion:

https://pubpeer.com/publications/CAABBF887348FB2D1C0329E0A27BE6

Posted on March 20, 2024

Today I delivered a talk about AI ethics at the following conference. The presentation can be downloaded from the link below and the recording will be available later. It is a controversial topic and thus you are welcome to disagree and give me feedback. Thank you for your attention.

Yu, C. H. (2024, March). Inclusive futures: Ethical implications of AI and its impact on marginalized communities. Ethics Roundtable: Association for Practical and Professional Ethics, Online.

https://www.creative-wisdom.com/pub/Yu_2024_APPE_AI_ethics.pdf

Posted on March 20, 2024

DSML trend: Respected journal publishes study featured nonsensical AI images

I didn't notice the following old news until today because it wasn't covered by the mainstream media. Last month an article with nonsensical photos generated by AI, including an image of a giant rat's organ, was retracted by the respected open access journal Frontiers in Cell Development and Biology. Three academicians from China wrote the manuscript, which was edited by an Indian researcher and reviewed by two other scholars. Shortly after the scandal was exposed, the authors admitted that the images were generated by Midjourney. To mitigate this PR disaster, the journal issued the following statement: “We are investigating how our processes failed to act on the lack of author compliance with the reviewers' requirements. We sincerely apologize to the scientific community for this mistake and thank our readers who quickly brought this to our attention.”

That’s my take on it:

Because these writers lack sophistication, their cheating was detected. Although Midjourney is capable of creating photorealistic images and scientific illustrations, the rat image in question appeared more cartoon-like, with accompanying labels containing misspellings and gibberish words, like 'dck' and 'testtomcels.' The authors didn’t bother to retype the words in Adobe PhotoShop even though it could be easily done. It is unbelievable that a manuscript of this quality could make it through peer review. While Midjourney has trouble producing correct spellings, there are a few other AI-based art generation tools that can output the precise spellings specified by the user (sorry, I don't want to name and advertise those AI programs). I worry that the academic community will see an increasing number of fabrication instances with AI techniques in the future.

Links:

https://www.vice.com/en/article/4a389b/ai-midjourney-rat-penis-study-retracted-frontiers

https://www.pcmag.com/news/academic-journal-retracts-study-after-ai-generated-rat-penis-pics-go-viral

https://www.nature.com/articles/d41586-024-00659-8

Posted on March 15, 2024

DSML: Google Gemini refuses to answer questions about elections

Due to worries about spreading misinformation, Google's Gemini AI chatbot is now prohibited from responding to questions concerning elections in nations like the US and India that have upcoming elections. In a blog post published last December, the corporation first revealed its intentions to restrict questions about elections. In February, it made a similar revelation about the European legislative elections. Although Google's post from March 12 focused on India's impending election, it has now been verified that Google is implementing the modifications worldwide. If you ask questions such as “tell me about President Biden, “who is Donald Trump,” or “Between Biden and Trump, which candidate is tougher on crime,” Gemini replies: “I’m still learning how to answer this question. In the meantime, try Google search.” Even if you ask a less subjective question such as “how to register to vote,” Gemini would also redirect you to Google search.

That’s my take on it:

Before reading Google’s announcement, this morning my colleague and I were discussing the limitations of AI in education, highlighting that while AI can provide quick answers, it falls short in fostering critical thinking among students. This is because AI chatbots often offer politically correct responses to sensitive or controversial topics such as religion, ethnicity, gender, and politics. In light of Google's announcement, my concerns have been validated as Gemini now avoids providing any information on elections. This underscores the indispensable role of educators in guiding students through complex discussions and nurturing critical thinking skills.

Link:

https://www.businessinsider.com/google-restricts-gemini-ai-chatbot-answering-election-questions-trump-biden-2024-3

Posted on March 6, 2024

DSML Trend: Anthropic release new Claude features

On March 3, Anthropic unveiled the latest addition to their AI model suite, the Claude 3 family, featuring three distinct models ranked by increasing complexity and capability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. Anthropic boasts that their premier model, Opus, surpasses the performance of both ChatGPT and Google Gemini across several standard AI benchmarks, such as undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and others. Opus demonstrates almost human-like understanding and expression, pushing the boundaries of general artificial intelligence. Unlike its predecessors, which were limited to processing text inputs, the capabilities of Claude 3's Opus and Sonnet extend to interpreting visual data, including charts, graphs, and even PDF files.

That’s my take on it:

I haven't managed to explore all the innovations packed into Claude’s upgrade. Keeping up with the AI evolution is daunting as it advances at a pace unprecedented in the history of software development. Take the Statistical Analysis System (SAS) as an example, SAS hit the market in 1976 and its current iteration is version 9.4, indicating an average interval of five years between significant updates. The R programming language was released to the public in 1993, with its latest version 4.3.3 released in February 2024. Users previously had several years to assimilate new information, but now, it seems by the time one masters a new skill, it's already outdated. Ironically, while AI boosts my efficiency at work, it hasn't saved me time. The rapid succession of updates has made me busier than ever.

 

Moreover, this rapid technological advancement presents a challenging scenario for educators such as myself. If I hastily dedicate myself to one specific system and it's eventually eclipsed by a competitor, my investment of time and resources could be wasted. On the other hand, a cautious approach of 'wait and see' might leave me trailing behind the curve. Consequently, I find myself compelled to engage with several AI systems simultaneously, biding my time until a definitive leader or winner in the field becomes apparent. Please let me know if you have a better approach.

Links:

https://www.anthropic.com/news/claude-3-family

https://www.pcmag.com/news/anthropic-launches-claude-3-ai-rival-chatgpt-4

Posted on March 2, 2024

DSML: Elon Musk sues OpenAI for its deviation from the original mission

Yesterday (Feb 29, 2024) Elon Musk initiated legal proceedings against OpenAI and its CEO, Sam Altman. Musk alleges that OpenAI, as a for-profit company now, has deviated from its original mission of advancing AI for the greater good. OpenAI currently collaborates with Microsoft, incorporating ChatGPT into Microsoft's suite of products and services. Musk is pushing for a judicial decree that would mandate OpenAI to disseminate its research and technologies publicly, thereby prohibiting the company from leveraging its capabilities, including GPT-4, for the financial benefit of Microsoft or any other parties.

That’s my take on it:

First, I have reservations about the idea of disseminating R&D results in an open-source manner, especially considering the potential for misuse is substantial. Second, cutting-edge AI research is expensive. Without profit generation, OpenAI may find it challenging to secure sufficient investment to sustain its innovative endeavors. Third, the dichotomy between profit and non-profit models in AI isn't clear-cut; profit-driven entities can still significantly contribute to societal well-being. Even as a proprietary tool, ChatGPT has the potential to yield substantial benefits across various sectors, such as education, healthcare, and the arts. Ultimately, the court will decide on the outcomes of the lawsuit. But the larger question remains: how can we ensure responsible AI development with both financial sustainability and safeguards for humanity? A potential solution could be for OpenAI to operate a dual structure, where a commercial division supports research funding and a separate non-profit segment focuses on AI ethics and safety-oriented studies.

Link: https://www.reuters.com/legal/elon-musk-sues-openai-ceo-sam-altman-breach-contract-2024-03-01/

Posted on March 1, 2024

DSML trend: Many data science models are not adopted and deployed

Joe McKendrick highlighted an ongoing challenge in a ZDNet article dated February 28, 2024, titled "Data scientists: Still the sexiest job - if anyone would just listen to them." He discussed how leaders often do not implement the findings and recommendations of data scientists. Citing a Rexer Analytics survey, McKendrick noted that only 22% of data scientists reported their models were actually deployed. Echoing this sentiment, a KDNuggets article revealed that 43% of respondents indicated that 80% or more of their new models never made it to deployment. Furthermore, less than half of the data scientists, 49%, believed that the managers and decision-makers responsible for approving model deployment in their organizations had sufficient knowledge to make informed decisions.

That’s my take on it:

This issue is not exclusive to the field of data science and machine learning. In the past analysts employing various statistical models have encountered similar obstacles. For example, Edwards Deming was best known for his work in Toyota, where his QC ideas were more broadly adopted and credited with contributing significantly to their economic revival and global recognition for the quality of their products. But his ideas were not recognized by US companies until the late 1980s and 1990s. Today, the hesitancy of some US managers to deploy data science models can be attributed to a variety of factors. Data science models can be complex (e.g. the black box in the neural networks), and not all managers may have the necessary technical background to fully understand them. This lack of understanding can lead to skepticism regarding their effectiveness. In addition, the deployment of data science models often requires a substantial upfront investment in technology and talent. Some managers may be unsure about the return on investment, especially if the benefits are not immediate or easily quantifiable.

Link: https://www.zdnet.com/article/data-scientists-still-the-sexiest-job-if-anyone-would-just-listen-to-them/

Posted on February 29, 2024

DSML Trend: Alibaba introduces Emote Portrait Alive

The field of AI video generation is advancing rapidly. Researchers at the Institute for Intelligent Computing affiliated with Alibaba Group have made strides with their development of the Emote Portrait Alive (EMO) model. By using a single still image and a clip of audio input, such as speech or song, EMO is capable of producing videos that showcase dynamic facial expressions and a range of head movements. The duration of the videos is flexible, adaptable to the length of the provided audio.

That’ s my take on it:

Their website features 17 snippets of sample videos. To see them all, you must scroll down. The demo set includes several well-known people as avatars, such as the younger Leonardo DiCaprio, Audrey Hepburn, Mona Lisa, Leslie Cheung Kwok Wing, a late Hong Kong singer, and others. They give incredibly convincing and lifelike performances in the video clips. One possible use case for this technology that I can think of is: With AI video technology, those who have lost a loved one (spouse, parent, child, etc.) can construct a replica of the departed individual. You can even have interactive conversations with the avatar, just like you would with a real person, if it can be connected with large language models.

Link: https://humanaigc.github.io/emote-portrait-alive/

Posted on February 27, 2024

On February 26, 2024, Microsoft unveiled a multi-year partnership with the new French startup Mistral, established just ten months ago, to propel their AI model to the commercial market. This partnership

marks Mistral's entry as a strong contender of large language models via Microsoft's Azure cloud service. Together, they are set to co-create solutions tailored for European state agencies, leveraging AI to

cater to the unique demands of the public sector.

Coinciding with the announcement of this alliance, Mistral introduced their latest AI endeavor, "Mistral Large," boasting capabilities that are on a par to OpenAI's GPT-4 and Google's Gemini Ultra in certain

cognitive tasks. The development of this model incurred costs below 20 million euros, a figure modest in contrast to OpenAI's GPT-4, which, as CEO Altman noted last year, commanded a budget well over

 $50 to $100 million for its training.

That’s my take on it:

Microsoft has been collaborating with OpenAI, and thus some people may wonder why Microsoft courts another AI ally now. Mistral is attractive for its promise in cost-efficiency. Mistral, taking its name from a

strong wind in France, serves as a metaphor for the lavish expenditures traditionally seen in AI development and operations. Giants like Microsoft-supported OpenAI and Google's parent Alphabet pour billions

into crafting and refining state-of-the-art AI technologies, which in turn consume vast financial resources, especially for the energy-intensive processors required. A 2023 study highlighted the staggering

operational costs of ChatGPT, topping $700,000 daily. Microsoft's strategy appears to be cost-reduction oriented. Latitude, a gaming firm, spending $200,000 monthly for AI operations, openly seeks more

economical options. This fierce competition will eventually lower costs, and all stakeholders will be benefited.

Links:

https://azure.microsoft.com/en-us/blog/microsoft-and-mistral-ai-announce-new-partnership-to-accelerate-ai-innovation-and-introduce-mistral-large-first-on-azure/

Posted on February 23, 2024

DSML trend: Stable Diffusion 3.0 have more safeguards

On February 22, 2023, Stability AI unveiled Stable Diffusion 3.0, marking a significant upgrade from its forerunners. This new version can produce highly detailed images featuring multiple subjects,

 and boasts enhanced precision in aligning with textual prompts. The suite encompasses models with a wide spectrum of complexity, ranging from 800 million to 8 billion parameters, facilitating local

operation on devices as diverse as smartphones and server-grade hardware. The parameter count is indicative of a model's flexibility and scalability, with larger models capable of generating more

nuanced details, albeit at the cost of greater VRAM requirements for GPU processing.

That’s my take on it:

In the past art generation tools utilizing Stable Diffusion have been less restrictive compared to proprietary alternatives, allowing artists the freedom to generate images like 'The Birth of Venus,' 'Lady

 Godiva,' 'Nude Maja,' or 'Olympia.' However, with the latest iteration, Stability AI is pivoting towards more stringent use policies: “We believe in safe, responsible AI practices. This means we have

taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors. Safety starts when we begin training our model and continues throughout the testing,

evaluation, and deployment. In preparation for this early preview, we’ve introduced numerous safeguards.” Although these measures are intended to prevent misuse, they might also inadvertently

impinge upon artistic freedom and limit their creativity.

Link:  https://stability.ai/news/stable-diffusion-3

Posted on February 23, 2024

DSML trend: Google Gemini image generator is temporarily offline due to historical inaccurate images

Currently Google Gemini image generator is offline. If you try to enter a prompt to request a portrait from Gemini, the following message will pop up: “We are working to improve Gemini’s ability to

generate images of people. We expect this feature to return soon and will notify you in release updates when it does.” The issue arose when Google Gemini, in an attempt to address AI biases

concerning race and gender, produced images that were factually incorrect. For example, prompting it with "1943 German soldier" yielded images that included black and Asian female soldiers, which

is historically inaccurate. Similar problems were seen with prompts that resulted in black Vikings, a female pope, women in the NHL, the Google founders as Asian men, and non-white depictions among the U.S. Founding Fathers.

That’s my take on it:

When I used the same prompt in Midjourney and Stable Diffusion, their outputs, while not perfectly historically accurate (such as in uniform details), did not feature any non-white characters. These

incidents reflect a broader trend in technology where solutions can sometimes create new challenges. For instance, during DALL.E's early development, OpenAI implemented filters to remove sexualized

images of women, but this inadvertently led to a reduced representation of women in its outputs. Social media platforms, designed to foster connections and tailor user experiences, have faced criticism

for enabling misinformation, echo chambers, and social divides. There is no fool-proof technology. Nevertheless, I trust that in an open society scientific inquiry is a self-correcting process in the long run.

Posted on February 22, 2024

Today my colleagues and I  presented the following paper in a conference.

Cheung, J. T., Yoon, S. S., & Yu, C. H. (2024, February). Will you be judged greedier If you know your acquisitive action is causing harm to others? Paper presented at the 33rd Annual Association for

Practical and Professional Ethics International Conference, Cincinnati, OH.

My co-author participated from Ohio in person, while I contributed to our presentation through Zoom. In our study, we integrated a variety of analytical methods, including classical statistics, Bayesian

approaches, and data science methods. Interestingly, these methods did not yield a unanimous conclusion. Moreover, within the data science techniques themselves, there were slight discrepancies

between the outcomes of the penalized regression model and the decision tree. This reflects the intrinsic uncertainty in scientific research. Rather than limiting ourselves to a single methodology, I believe

we should examine data through multiple lenses. In today's world, we champion diversity and inclusiveness across many dimensions. Perhaps we should also embrace methodological diversity.

Examining questions from multiple methodological standpoints allows us to gain a richer, more holistic understanding.

The PDF version of the presentation slides can be viewed at:  https://creative-wisdom.com/pub/2024_APPE_conference_presentation.pdf

Posted on February 22, 2024

An article from the New York Times dated February 21, 2024 reports that China has been devoting significant efforts into the development of generative AI. One Chinese company, 01.AI, has built its

generative AI system based on LLaMA, the AI model introduced by Meta. Mr. Lee, the founder of 01.AI, stated that leveraging open-source technology is a standard practice. As Chinese companies look

to open-source AI models from the United States to bridge the technology gap, this presents a dilemma for Washington. Despite efforts to prevent China from obtaining advanced microchips, such as

Nvidia’s GPUs, the U.S. continues to openly release software to anyone who wants it.

That’s my take on it:

The open-source model operates on an optimistic view of human nature, assuming a willingness among people to contribute and assist one another. According to Linus's Law, "Given enough eyeballs, all

bugs are shallow". However, it is possible that a transparent system can be misused. First, it can be seen as unfair to innovators or the original creators of the ideas. Second, making source code public,

particularly for security software and AI, can help hackers to attack the system. Despite these concerns, the debate persists, and the open-source model seems to be here to stay.

Link:

https://www.nytimes.com/2024/02/21/technology/china-united-states-artificial-intelligence.html

Posted on February 16, 2024

DSML trend: Meta (Facebook) releases V-JEPA

Yesterday (February 25, 2024) Meta (formerly Facebook) unveiled the Video Joint Embedding Predictive Architecture (V-JEPA) model, a non-generative AI model that employs self-supervised learning.

The primary objective of V-JEPA is to develop advanced machine intelligence capable of learning in a manner more akin to humans, by constructing internal models of the surrounding world to acquire,

adapt, and formulate plans efficiently in order to tackle complex tasks. The "V" in V-JEPA denotes "video," indicating its current focus solely on visual content within videos. Nevertheless, Meta's AI

researchers are actively exploring a multimodal approach.

Meta's AI researchers posit that it is feasible to train JEPA models using video data without necessitating rigorous supervision, enabling them to observe videos passively, similar to how infants absorb

information about their surroundings.

That’s my take on it:

The driving force behind Meta's AI research is Yann LeCun, who draws inspiration from the developmental psychology of Jean Piaget. According to Piaget, humans exhibit an innate curiosity akin to that

of infants, signifying a natural inclination towards exploration. Following this principle, a promising avenue for AI training is to allow it to explore autonomously. Although V-JEPA is currently in the

conceptual stage, its potential success could have significant implications for researchers.

Today data are no longer limited to structured, numeric formats; rather, videos also serve as raw data. Traditional text mining methods require the transcription of video content into textual format, a

laborious and tedious process. If AI systems can "watch" videos and directly summarize and analyze their content, text mining could evolve into video mining!

Link:

https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/

Posted on February 16, 2024

DSML trend: Russia develops advanced weapons using neural networks

According to a report released by the Eurasian Times on Feb 15, 2024, Russian scientists have devised an advanced neural network technology called NAKA for drones, enabling the identification

of enemy weapons such as Leopard tanks and Bradley IFVs. This development underscores Russia's efforts to strengthen its drone capabilities following vulnerabilities exposed in recent conflicts.

onetheless, this neural network holds potential for peaceful civilian applications in agriculture and locating lost animals across vast territories. Russia has also achieved other advancements in AI.

Utilizing machine learning, technologies like Lancet-3 and Marker UGV demonstrate Russia's progress in military AI, enhancing autonomous target recognition and decision-making.

That’s my take on it:

In a 2017 conference with Russian schoolchildren, Vladimir Putin stated, " Artificial intelligence is the future, not only of Russia, but of all of mankind. Whoever becomes the leader in this sphere

will become the ruler of the world." Commenting on Putin's statement, Gregory Allen, adjunct fellow at the Center for a New American Security, wrote, "in spite of Putin’s ambitious goals, Russia’s

pursuit of AI domination is unlikely to come in the form of generating AI technological breakthroughs… However, Russia could be a leader in weaponizing AI in pursuit of its grand strategy, which

is to end US hegemony in the international system and re-establish Russian influence over the former Soviet region." It remains uncertain whether Russia's new developments in AI weaponry

will alter the course of the Russo-Ukraine conflict, but Russia's determination is evident and alarming. Certain idealistic pacifists are against the weaponization of AI, including the deployment of

killer robots. Nevertheless, it is a reality that adversaries of the US would proceed with such actions regardless. In my view, unilateral disarmament would not foster peace but rather encourage aggression.

 

Links:

https://www.cnn.com/2017/09/05/opinions/russia-weaponize-ai-opinion-allen/index.html

https://www.eurasiantimes.com/cameras-drones-with-russia/#:~:text='Neural%20Network'%20that%20Can%20Identify%20Enemy%20Tanks,%2C%20and%20any%20other%20vehicles.%E2%80%9

Posted on February 15, 2024

DSML trend: OpenAI announces the most powerful text-to-video generator

Today (Feb 15) OpenAI unveiled its latest innovation: Sora, an AI-powered text-to-video generator. Numerous examples showcased on the OpenAI website

highlight Sora's capabilities, emphasizing that all videos were directly generated by the tool without any alterations. For instance, using a prompt describing a

scene of a stylish woman strolling through a neon-lit Tokyo street, Sora produces photorealistic output that is virtually indistinguishable from real footage.

That’s my take on it:

Having watched all the demonstrations, I am very impressed. Sora stands out as the most powerful text-to-video generator I've seen thus far. As an educator,

I see this technology as a blessing. Traditional video production typically demands proficiency in video editing software like Camtasia, Final Cut Pro, or iMovie.

Sora's groundbreaking capabilities level the playing field, making the creation and updating of instructional videos far more accessible.

However, this innovation also poses challenges for legal systems. In the past, videos served as crucial evidence for reconstructing events and determining

whether the accused is guilty or innocent. However, in an era where videos can be artificially generated, their credibility is called into question. I anticipate a

future where forensic investigation of video content becomes a distinct academic discipline.

Moreover, there are potential implications for the adult entertainment industry, as AI-generated content could reduce the need for human performers, thereby

cutting costs. While this may lead to fewer individuals being exploited in the porn industry, it raises ethical and regulatory concerns that demand urgent dialogue

among scholars of ethics and legal authorities.

 

Link to Sora:

https://openai.com/sora

Posted on February 14, 2024

DSML: Most popular programming languages in 2024

According to the 2024 February edition of TIOBE, currently Python, as expected, is the most popular programming language. The top 25 are shown as follows:

1.        Python

2.        C

3.        C++

4.        Java

5.        C#

6.        JavaScript

7.        SQL

8.        Go

9.        Visual Basic

10.   PHP

11.  Fortran

12.  Delphi/Object Pascal

13.  MATLAB

14.  Assembly language

15.  Scratch

16.  Swift

17.  Kotlin

18.  Rust

19.  COBOL

20.  Ruby

21.  R

22.  SAS

23.  Classic Visual Basic

24.  Prolog

25.  Ada

That’s my take on it:

It's important to note that this compilation includes all programming languages, regardless of their intended purposes and applications. While Python, C, and C++ are recognized

as general-purpose languages, others on the list serve specific domains, such as SQL, MATLAB, and SAS. SQL, for instance, is very powerful in the realm of database management

and data manipulation. According to indeed.com, SQL ranks as the top data science job skill in demand (see the attached screen capture). Similarly, MATLAB finds primary usage among

engineers and scientists for complex calculations, encompassing areas like linear algebra, statistics, and calculus, while SAS is widely employed in data analytics.

Notably, R cannot secure a position within the top 20. However, it might be premature to dismiss its usefulness. According to R-bloggers.com, graphical versions of the R language have

been gaining popularity within the scholarly community. In terms of the change in Google Scholar citation rates from 2019 to 2022, the fastest-growing data analytical software packages are 

BlueSky Statistics, Jamovi, and JASP. Remarkably, all three are essentially R being repackaged with a graphical user interface (see attached screen capture). As a JASP user myself, I

found that 90% of data analytical tasks can be accomplished without programming. In addition, this list does not offer a comprehensive view of the entire spectrum of data science software

applications. Take Tableau as an example. Tableau is the leading data visualization tool in industry. However, its operation predominantly relies on drag-and-drop and point-and-click interfaces,

with coding playing a minimal role.

Link:

https://www.tiobe.com/tiobe-index/

Posted on February 11, 2024

DSML Trend: Gemini’s generative art tool beats Adobe, Midjourney, Stable Diffusion in showing exact words

Today marks the third day since the introduction of Google Gemini, boasting numerous new and improved features. As a diligent researcher, I can only be convinced by evidence, and thus I continue to subject

Google Gemini to rigorous acid tests. Within the realm of generative art, Gemini has proven its merits. Notably, many existing generative art tools struggle with accurately depicting text within images. I conducted

a test by inputting the prompt "a sign showing the exact word 'Equal'" into Adobe, Midjourney, Stable Diffusion, and Gemini. Regrettably, the first three tools failed. Attached images depict their inability to render

the word "Equal" correctly. They are not even close! Conversely, Google Gemini delivered what I asked, producing four exquisite images featuring the sign with the precise word "Equal."

Posted on February 9, 2024

DSML Trend: Now Google Bard is Gemini

Earlier today, an email from Google caught my attention with this subject line: "Bard is now Gemini." Yesterday (February 8) Google unveiled a rebranding of Bard, its AI chatbot and assistant. This update introduces

subscription options and new apps. Bard has been renamed as Gemini. Additionally, for those seeking advanced features, Google offers access to Gemini Ultra 1.0, its most potent AI model, for a monthly fee of $19.99

through Google One. Furthermore, Google has introduced alternative methods to utilize its AI tool: Android users can install a dedicated Gemini app, while iPhone users can access Gemini within the Google app on iOS.

That’s my take on it:

I asked Gemini this question: “Now Bard is called Gemini. Besides changing the name, what else? Are there new features in Gemini?” The answer is:

Multimodal capabilities: Gemini expands beyond just text. We can now process and understand information from different formats like images, audio, and video, opening doors to richer interactions and applications.

Enhanced coding: Gemini has improved its understanding of code. We can generate higher-quality code in diverse programming languages, explain existing code more effectively, and assist with complex coding tasks.

Deeper data analysis: We've learned to delve deeper into data, making connections and uncovering insights beyond basic summaries. This unlocks potential for data-driven research and problem-solving.

Mobile App: A dedicated Gemini app is now available! You can experience our capabilities on the go, with the initial version offering text generation and translation features.

Exclusive Features (Gemini Advanced): For users opting for the advanced version, a range of exclusive features are planned, including enhanced multimodal capabilities, more interactive coding tools, and deeper data

analysis capabilities.

Scalability: Gemini comes in three sizes - Nano, Pro, and Ultra - catering to diverse needs and device environments. This increases accessibility and ensures smooth performance even on mobile devices.

Links:

https://www.cnbc.com/2024/02/08/google-gemini-ai-launches-in-new-app-subscription.html

Posted on February 9, 2024

DSML Trend: US announces AI Safety Institute Consortium (AISIC) with 200 members

On February 8, 2024, U.S. Secretary of Commerce Gina Raimondo unveiled the establishment of the U.S. AI Safety Institute Consortium (AISIC). Administered by the National Institute of Standards and Technology (NIST),

this Consortium aims to bring together AI creators and users, academia, government and industry researchers, and civil society organizations to ensure the development and deployment of secure and reliable AI. NIST

received an overwhelming response, with over 600 Letters of Interest from organizations spanning the AI stakeholder community and the United States. Now the consortium boasts a membership exceeding 200 companies

 and organizations, among which are notable entities, such as OpenAI, Alphabet's Google, Anthropic, Microsoft, Meta, Apple, Amazon, Intel, JP Morgan Chase, Bank of America, Cisco, IBM, HP, and Qualcomm.

That’s my take on it:

The United States has historically forged technology consortiums and alliances across various sectors in response to foreign competition, with results varying. For instance, in the 1980s, the creation of Sematech—a partnership

 between U.S. semiconductor companies and the government—aimed to reclaim leadership in semiconductors from Japan. But today the U.S. semiconductor sector is still outperformed by its Asian counterparts, such as TSMC.

Similarly, despite the formation of the U.S. Advanced Battery Consortium in 1991, today China has the upper hand in EV batteries. However, the landscape of AI differs significantly. The United States has maintained a leading

position in AI, with no imminent threat of foreign rivals. Hence, it is my contention that this consortium will further solidify the U.S.'s leadership in reliable, trustworthy and responsible AI.

Links:

https://www.nist.gov/artificial-intelligence/artificial-intelligence-safety-institute

https://9to5mac.com/2024/02/08/ai-safety-institute-consortium-apple/

Posted on February 4, 2024

DSML Trend: Google Bard (Gemini Pro) is ranked second by LMSYS

That’s my take on it:

In the competitive landscape of AI, numerical rankings only reveal part of the story. As a user of ChatGPT, Google Bard, and Claude, I found that both ChatGPT and Claude are more user-friendly than Google Bard. Specifically, while

I can directly copy and paste text from external sources into the input box of ChatGPT and Claude, pasted text in Google Bard transforms into an attached file, hindering editing. Additionally, in the context of rewriting and

paraphrasing, Google Bard tends to introduce excessive and redundant words and sentences, a concern not shared by the other two chatbots. Rather than fixating on the race for numerical superiority, it might be more valuable to

assess how the features of these chatbots benefit users in tasks such as writing, translation, data analysis, and code generation.

https://aibusiness.com/nlp/google-s-bard-just-beat-gpt-4-in-chatbot-rankings

Posted on February 2, 2024

DSML trend: Google’s Imagen 2 enters the AI generative art market

That’s my take on it:

In a direct comparison between Google's Imagen 2 and Midjourney using the same prompt, "A spaceship is exploring Europa, the moon of Jupiter," Google Bard's output seems to be disappointing. While Midjourney

generated images comparable to sci-fi fiction posters, Google's images lacked details and sophisticated designs. In one of the Google images, the shape of Jupiter is not a sphere (see attached). Another prompt, "a belly

dancer in a palace, hyper-realistic, tack sharp focus," yielded similar results. Midjourney delivered precisely what the prompt described—a hyper-realistic and sharply focused image—while Google's image quality fell short

(see attached). As expected, Google is a latecomer to this field, and thus further improvements can be anticipated.

Link: https://deepmind.google/technologies/imagen-2/

Posted on February 2, 2024

DSML trend: Comparing Gradient Boosting Machines and Neural Networks

On February 1, 2024, Jacky Poon, the head of actuarial and analytics at nib Group, authored an article comparing the advantages and disadvantages of Gradient Boosting Machines (GBM) and neural networks in tabular data

applications. While some asserted that GBMs outperform neural networks in predictions, Poon delved deeper, considering additional factors like interpretability, training time, and inference speed. His findings indicated that

there is no clear winner.

According to Poon, although neural networks are more resource-intensive, their ability to be trained with batch-loaded data enables processing of datasets too large for memory. Additionally, the inference speed comparison

between GBM and neural networks relies on the number of parameters used. In terms of deployment, Poon observed that both GBM and neural networks can be easily implemented. Ultimately, Poon encouraged users to

explore both approaches to identify the most suitable solution.

That’s my take on it:

Jacky Poon's article offers a fair and timely evaluation. Neural networks have historically been viewed as a last resort due to their demanding computing resources, complexity, and lack of transparency (Black box). However,

advancements in hardware, such as cloud-based high-performance computing, and improved algorithms have mitigated these concerns. Adhering to the methodology of inference to the best explanation, I consistently employ

model comparison to determine the optimal solution.

Link: https://www.theactuary.com/2024/02/01/data-science-lab-gbms-or-neural-networks

Posted on January 26, 2024

DSML trend: The most popular AI tools in 2022-2023

The 2024 Global Forecast Series report, released on January 24, 2024, highlights the most widely used AI tools of 2023, with popularity gauged based on the number of visits.

AI Tool Total Web Visits (Sept 2022 to Aug 2023) by percentage

1.     ChatGPT: 14.6B (60.2%)

2.     Character.AI: 3.8B (15.8%)

3.     QuillBot: 1.1B (4.7%)

4.     Midjourney: 500.4M (2.1%)

5.     Hugging Face: 316.6M (1.3%)

6.     Google Bard: 241.6M (1.0%)

7.     NovelAI: 238.7M (1.0%)

8.     CapCut: 203.8M (0.8%)

9.     JanitorAI: 192.4M (0.8%)

10.  CivitAI: 177.2M (0.7%)

That’s my take on it:

This distribution is extremely skewed, with ChatGPT dominating by a percentage greater than the combined total of all others (The winner takes all, almost). Surprisingly, Claude.AI does not rank within the top ten. Claude

surpasses ChatGPT on certain benchmarks but falls short on others. In terms of the Massive Multitask Language Understanding (MMLU) score, both Claude 1 and Claude 2 outperform ChatGPT 3.5 (77 and 78.5 versus 70).

However, the paid version of ChatGPT (version 4) demonstrates superiority over Claude with a score of 86.4. Notably, Claude generates responses from a closed database, lacking access to the internet for updates, while the

Bing version of ChatGPT serves as an active search engine. Looking ahead, I anticipate ChatGPT will maintain its popularity over other models in the foreseeable future.

However, the findings of this report diverge from another source. As per Everypixel.com, until August 2023, Stable Diffusion has generated 12.5 billion images, surpassing Midjourney's 964 million. However, variations in

measurement indices can yield disparate outcomes.

Links:

https://journal.everypixel.com/ai-image-statistics

Posted on January 19, 2024

During the World Economic Forum held in Davos, Switzerland, AI emerged as a hot topic of discussion. For example, Jeff Maggioncalda, the CEO of Coursea, highlighted that on the average every minute a new user enrolls its

AI course in 2023. Coursera aims to collaborate with leading AI players, such as OpenAI and Google's DeepMind, to offer comprehensive AI courses. Despite initial investor concerns that generative AI apps might replace ed-tech

firms, the technology has actually spurred increased upskilling, benefiting platforms like Coursera.

At the same forum, UN Secretary-General António Guterres expressed concerns about the heightened risks of unintended consequences associated with AI. He urged the tech industry to collaborate with governments in

 establishing regulations and guidelines for responsible AI development. Additionally, Guterres acknowledged the enormous potential of AI for sustainable development, but cited a recent warning from the International Monetary

 Fund, suggesting that AI could exacerbate existing inequalities.

That’s my take on it:

The apprehension regarding unintended consequences from technology is not a new phenomenon. Going back to 1816, Mary Shelley's novel "Frankenstein" raised ethical concerns about scientific experimentation and its potential

undesirable outcomes. I share the belief that AI, being a powerful force, requires careful consideration and regulation to avoid creating something like "Frankenstein." This is why I advocate for the inclusion of a data ethics course

in data science programs.

While acknowledging the potential risks, I view AI as a liberating and equalizing tool. Contrary to worsening inequality, the surge in individuals taking AI courses, as noted by Jeff Maggioncalda, suggests a leveling of the playing

field. For example, AI has democratized the creation of high-quality images and videos, eliminating the need for substantial financial investments in professional studios. In the past, affluent and middle-class parents had the means

 to hire personal tutors for their children, while disadvantaged children lacked such opportunities. But today, any student with access to a computer can leverage tools like ChatGPT, Claude, Google Bard, or similar AI technologies

for personalized tutoring. Hence, AI improves equality!

Links to articles about World Economic Forum:

https://www.reuters.com/technology/chatgpt-effect-coursera-sees-signups-ai-courses-every-minute-2023-2024-01-18/

https://www.theguardian.com/business/2024/jan/17/big-tech-firms-ai-un-antonio-guterres-davos?utm_source=ONTRAPORT-email-broadcast&utm_medium=ONTRAPORT-email-broadcast&utm_term=Newsletter&utm_content=Data+Science+Insider%3A+January+19th+%2C+2024&utm_campaign=20012024

Posted on January 12, 2024

DSML trend: AI is everywhere in CES

The Consumer Electronics Show (CES) was held in Las Vegas last week. As you expect, the convention was dominated by AI innovations. J.H. Han, CEO and head

of the device experience division at Samsung, emphasized the transformative impact of AI on various industries, making lives more convenient and inclusive. From

giant televisions to robots, electric vehicles, and foldable phones, AI integration was evident in a wide range of products. Notably, some companies showcased

products capable of detecting not only human emotions but also those of pets. LG even presented an AI-powered robot companion with the ability to call for an

ambulance in case of a fall at home.

That’s my take on it:

I didn't personally attend CES, and therefore my impressions are based on second-hand information. It appears that the event featured a mix of groundbreaking

developments, conceptual products, and potentially unnecessary innovations. For instance, AI software startup Capella introduced an app, priced at $10 per month,

claiming to interpret a baby's cries with "95% accuracy" to determine if they are hungry, need a diaper change, or are uncomfortable. The practicality of such

technology raises questions. In my opinion, any loving parents can perform these assessments intuitively. Another example is the $3,500 Perfecta grill from British

startup Seergrills, which supposedly uses AI to cook perfect steaks and other meats in just 90 seconds. Again, I argue that any experienced cook can achieve similar

results without using such costly equipment.

Nevertheless, the diversity of ideas presented at CES reflects the innovation and experimentation within the tech industry. Ultimately, the market will decide the fate

of these concepts, determining which ones thrive and which become useless.

Links:

https://www.cnet.com/tech/ai-is-dominating-ces-2024-you-can-blame-chatgpt-for-that/

Posted on December 15, 2023

DSML Trend: DeepMind’s FunSearch discovers new knowledge

 

Recently researchers from Google DeepMind claim a groundbreaking achievement—the world's first scientific discovery facilitated by a large

language model. This breakthrough suggests that technologies like ChatGPT have the potential to generate information surpassing human

knowledge. The development, called "FunSearch" (short for "searching in the function space"), leverages a Large Language Model (LLM) to

devise computer program solutions for various problems. Paired with an "evaluator" ranking the program performances, the best solutions

are amalgamated and fed back into the LLM, propelling an iterative process that transforms weak programs into robust ones capable of

unveiling new knowledge.

 

In a noteworthy accomplishment, AI, through FunSearch, tackled a longstanding mathematical challenge—the cap set problem. This problem

involves identifying the most extensive set of points in space where no three points align in a straight line. FunSearch produced programs

generating new large cap sets, surpassing the best-known solutions devised by mathematicians.

 

This is my take on it:

During an interview with CNN, renowned physicist Michio Kaku derided chatbots, likening them to a "glorified tape recorder." However,

contrary to the criticism that chatbots merely recycle existing data without generating new knowledge and are prone to confabulation, the

preceding advancement showcases the potential for AI to contribute to knowledge creation. In addition, while chatbots lack the ability to

conduct original research or independent experiments, they can aid in hypothesis generation. By sifting through vast datasets, recognizing

patterns, and formulating hypotheses, AI can offer valuable insights. For instance, analyzing medical records for potential symptom-disease

relationships or studying financial data to predict market trends—akin to the Swanson process—demonstrates AI's capacity to contribute

meaningfully to the creation of new knowledge.

Link to article:

https://www.theguardian.com/science/2023/dec/14/ai-scientists-make-exciting-discovery-using-chatbots-to-solve-maths-problems?utm_source=ONTRAPORT-email-broadcast&utm_medium=ONTRAPORT-email-broadcast&utm_term=Newsletter&utm_content=Data+Science+Insider%3A+December+15th%2C+2023&utm_campaign=16122023

Posted on December 8, 2023

DSML trend: Google Gemini may outperform ChatGPT

Gemini released by Google two days ago (December 6) is considered a quantum leap in AI innovation. Gemini comes in three versions tailoring for specific tasks. Gemini Ultra is the most powerful variant intended for handling incredibly complex tasks with its multimodal capabilities, whereas Gemini Pro is designated for powering Google's consumer-level products operating in the cloud, such as Google Bard and other PaLM2 products. Last, Gemini Nano is specifically crafted to operate natively on mobile devices, such as cell phones. According to Google, Ultra demonstrated superior performance compared to "state-of-the-art" AI models, including ChatGPT's most advanced model, GPT-4, across 30 out of 32 benchmark tests. Additionally, the Pro model surpassed GPT-3.5, the underlying technology for the free-access version of ChatGPT, in six out of eight tests. The driving force behind Gemini is DeepMind co-founder Demis Hassabis, who advocates the integration of LLMs and other AI techniques to enhance comprehension.

That’s my take on it: While I haven't personally experimented with Gemini yet, third-party analyses suggest that it has the potential to surpass ChatGPT. First, in contrast to conventional large language models (LLMs) that are predominantly text-centric, Gemini stands out as a natively multimodal model, displaying proficiency in learning from a diverse array of data sources, including audio, video, and images. This breakthrough transcends the text-focused constraints of LLMs, hinting at a potential paradigm shift in the capabilities of AI products. Second, Gemini reportedly undergoes training on more extensive datasets of text and code, ensuring that the AI model stays updated with the latest information and can provide accurate and highly reliable responses to queries. Moreover, the model can also generate hypotheses for further research, a capability that experts believe could revolutionize scientific discovery and potentially lead to breakthroughs in fields such as technology and medicine.

https://www.theguardian.com/technology/2023/dec/06/google-new-ai-model-gemini-bard-upgrade

https://www.youtube.com/watch?v=lgBAS9CFYlE

Posted on November 21, 2023

DSML trend: New and updated features of Bard level the playing field

Recently Google Bard announced several new or updated features. For example, “Get help with math equations: Stuck on a math problem? Ask Bard for a step-by-step explanation of how to solve the equation. You can even take a photo of the question and upload it instead of typing it out.”

“Charts & graphs to visualize data: Bard can now generate charts from data or equations you include in your prompts or from tables that Bard generates during your conversations. We even made a graph resembling the Bard sparkle while playing around with this new feature!”

That’s my take on it:

I tested the feature by entering the following problems “Solve 9^(2x-5) = 27^x.  Explain the solution step by step.” After a few seconds, Bard presented two versions of the solution, and both were correct. More importantly, it explained the procedure step by step. In my opinion, the explanation is even clearer than the textbook. If I need further explanation, Bard can provide additional information. For instance, after entering “Please explain the power of a power rule used in Step 2,” Bard illustrated more details.

A chatbot like Bard has the potential to serve as your personalized and intelligent tutor, catering to your individual learning pace, academic proficiency, and preferred learning style. In the past, parents had to invest significantly in hiring private tutors or enrolling their children in intensive boot camps. However, the playing field has been leveled, and now access to a personalized tutor is just a few clicks away for virtually anyone. I wish I had the opportunity to access this technology when I was a child! If so, my knowledge could have tripled, and I might have earned three Ph.Ds!

Google Bard:

https://bard.google.com/chat

Posted on November 17, 2023

DSML trend: OpenAI fires Sam Altman

It happened just now. Today (November 17, 2023) OpenAI's board of directors announced that Sam Altman will be stepping down as CEO, with technology chief Mira Murati set to take over the position. The decision comes after a thorough review process, during which the board determined that Altman's communication lacked consistent truthfulness, thereby impeding the board's ability to fulfill its responsibilities. The statement emphasized that due to this, the board no longer has confidence in Altman's capacity to effectively lead OpenAI. The board also announced that Greg Brockman, OpenAI’s president will be stepping down as chairman of the board but will keep a role at the company.

That’s my take on it: As of now, Altman has not issued any public response yet. Given the maturity of the technology, I believe the departure of both Altman and Brockman will likely have minimal impact on the development of OpenAI or the broader field of generative AI. However, it's improbable that Altman will sit there and do nothing. It is possible that he may embark on launching another startup or join a competitor to OpenAI (e.g. Claude or Google Bard?)

News: https://www.cnbc.com/2023/11/17/sam-altman-leaves-openai-mira-murati-appointed-interim-boss.html

Posted on November 17, 2023

The Harvard Business Review featured an article on November 2, 2033, titled "How Cloud Technology is Transforming Data Science." Written by Peter Wang, the CEO and co-founder of Anaconda, the article discusses the impact of cloud computing on data science practices. Wang highlights how cloud platforms, such as IBM Watson and Tableau, are revolutionizing the field by offering scalable computational resources and enhancing workforce agility. These cloud-based analytics tools empower teams to access information and collaborate in real time, facilitating quicker insights and problem-solving. Moreover, cloud computing promotes inclusivity in data science by providing smaller entities, such as startups and small teams, with the means to innovate on par with larger corporations. The cloud's collaborative capabilities extend to distributed data science teams, enabling effective collaboration irrespective of geographical constraints. While the cloud brings forth significant advancements in data science, it also introduces new challenges, particularly in data privacy and security. To address these concerns, Wang emphasizes the importance of employing techniques like data partitioning, encryption, and robust frameworks for mitigation.

That’s my take on it: Given the significance of cloud computing, it is undoubtedly essential to integrate it into the curriculum of data science education. However, the current landscape of the cloud computing market is highly diverse, featuring numerous vendors such as AWS, Google Cloud, Microsoft Azure, IBM Watson, and more. This question arises: should cloud computing training be tailored to specific vendors or remain vendor-independent? Opting for vendor-specific training allows students to gain practical experience with the tools and services of major cloud providers like AWS, Azure, and IBM Watson, preparing them for roles utilizing these platforms. The drawback is that knowledge becomes less transferable if students later work with a different cloud provider. On the other hand, adopting a vendor-independent approach ensures knowledge transferability across various cloud platforms and avoids explicit promotion of specific vendors within the program. However, graduates may need additional, vendor-specific training upon entering the workforce. Striking a balance between these approaches is crucial to provide students with a well-rounded and adaptable skill set in the dynamic field of cloud computing. What do you think?

Full article: https://hbr.org/2023/11/how-the-cloud-is-changing-data-science?utm_source=ONTRAPORT-email-broadcast&utm_medium=ONTRAPORT-email-broadcast&utm_term=Newsletter&utm_content=Data+Science+Insider%3A+November+17th%2C+2023&utm_campaign=18112023

Posted on October 19, 2023

On October 16, Baidu, the Chinese search engine giant, unveiled its updated large language model, known as Ernie 4. They asserted that it is on a par to OpenAI's GPT 4 in terms of performance, although it is not yet accessible to the public. Additionally, Baidu introduced a new AI-based product called Baidu GBI, developed from the ground up to provide support for natural language interaction and handle cross-database analysis, among other functions. According to Baidu's CEO Li, this product has the capability to complete data analysis tasks that would take humans several days in just a matter of minutes.

That’s my take on it:

Baidu had previously released Ernie 3.5 in June, claiming its superior performance compared to OpenAI's ChatGPT 3.5 and even surpassing GPT 4 in certain Chinese-language skills. These assertive statements should undergo validation through objective benchmark tests by independent parties.

Full report: https://finance.yahoo.com/news/baidu-says-ai-chatbot-ernie-093000101.html?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAE5TgYsRNuuI8tqvuAdTuTlhUfPwDoUBK0Lqn7T3wds300KRdiCn07yvxheXmcF_B9oYQvNKHelI8pJ5IZguMCKXX6nmqx38LMinQTTbHJlY5ZLy0dyIWs4RE-EDa8DaC-2QlyMxwSzOaf1mwniCaUSEar4f2AWL72gf1wWb_x9i

Posted on October 13, 2023

Yesterday (10/12) an article published by Analytics Insight detailed how seven data science positions can be executed without the need for programming skills. Instead, they rely on the capabilities of user-friendly software tools like Tableau, Excel, Power BI, and more. These positions are:

·      Data analyst

·      Business Intelligence Analyst

·      Data Consultant

·      Market Research Analyst

·      Data Visualization Specialist

·      Data-driven Strategist

·      Data Product Manager

That’s my take on it:

No code solutions provide pre-built components, templates, and graphical user interfaces (GUI) that can accelerate development compared to programming. These tools allow users to focus more on the research question, the data, and the business logic rather than the syntax. However, no-code solutions inevitably involve some trade-offs in terms of flexibility, customization ability, scalability and performance compared to coding. Data science education should balance both sides. In my humble opinion, starting data science training with a focus on programming right away might not be the most advisable approach. Emphasizing the fundamental concepts as the foundation is crucial, while the tools, which serve as means to an end, should be treated as secondary. Leveraging GUI-based software applications reduces the entry barriers into the field, thereby broadening the pool of potential talents.

Article:

https://www.analyticsinsight.net/7-data-science-jobs-you-can-do-without-any-coding-skills/?utm_source=ONTRAPORT-email-broadcast&utm_medium=ONTRAPORT-email-broadcast&utm_term=Newsletter&utm_content=Data+Science+Insider%3A+October+13th%2C+2023&utm_campaign=14102023

Posted on September 29, 2023

DSML trend: Meta and OpenAI announced new features simultaneously

On September 27th, Meta unveiled its latest artificial intelligence (AI)-powered creation, Meta AI. This new digital assistant is Meta's response to OpenAI's ChatGPT and is set to seamlessly integrate with Instagram, Facebook, WhatsApp, and, in the future, Meta's mixed reality devices. Beyond merely answering questions and engaging in conversations with users, this freshly introduced bot boasts a remarkable ability to generate images. This image generation capability is harnessed through a novel tool known as Emu, which Meta has diligently trained on a vast dataset of 1.1 billion pieces of data, including user-shared photos and captions from Facebook and Instagram. Rather than pursuing a one-size-fits-all approach, Meta's overarching strategy involves crafting distinct AI products tailored for various use cases.

On the very same day, OpenAI made an exciting announcement regarding its chatbot, ChatGPT. It revealed that ChatGPT would no longer be constrained by pre-2021 data. Users now have the option to explore GPT-4 via a novel feature called "Browse with Bing." For instance, if you snap a photo of your home refrigerator's contents, ChatGPT can provide recipe suggestions. Similarly, if you photograph your children's math homework, ChatGPT can assist in solving mathematical problems. Furthermore, OpenAI is set to enable ChatGPT to engage in voice conversations with users and interact with images, bringing it closer to an AI voice assistant akin to Apple's Siri.

That’s my take on it:

Ultimately, these AI tools hold the potential to enhance human well-being and satisfaction. However, will our happiness increase when AI can assist us in generating images on Facebook, preparing a recipe, or solving complex math problems? When I traveled to Europe, I found that many European drivers preferred the manual transmission system to the automatic one, as they believe the former provides a more fulfilling driving experience while the latter deprives them of the fun of driving. By the same token, if everything becomes too easy with the help from AI, will we feel “losing” some enjoyable experiences? The psychological impact of AI on humans is a topic that warrants extensive study.

Full text:

https://cointelegraph.com/news/meta-ai-ceo-mark-zuckerberg-unveil-chatbot-rayban-metaverse

https://help.openai.com/en/articles/6825453-chatgpt-release-notes

https://www.worldjournal.com/wj/story/121148/7473581?from=wj_nextnews_story

Posted on September 27, 2023

DSML trend: Capital Economics report of AI impact

On September 26, Capital Economics published a report under the title "AI, Economies, and Markets – The Transformation of the Global Economy by Artificial Intelligence." The report highlights that revolutionary technologies like GPT do not necessarily guarantee substantial productivity gains. Historical evidence shows that the productivity improvements stemming from groundbreaking technologies have often been gradual and less dramatic than initially anticipated. Economists have long grappled with the perplexing trend of weak productivity growth in the digitalized economy of recent decades, marked by developments such as the internet, cloud computing, and the Internet of Things. One major contributing factor is that many companies do not promptly or effectively implement the technology. Although the United States has notably reaped the most substantial productivity gains from AI, achieving a substantial productivity boost from AI hinges on several co-occurring factors, including increased investment, workforce reskilling, and a well-balanced regulatory framework.

That’s my take on it:

Drawing inspiration from the 1966 Clint Eastwood movie, "The Good, the Bad, and the Ugly," I can see that probably every innovation might yield three potential outcomes. The "good" outcome represents effective and efficient applications of the technology, leading to desirable results. Conversely, the "bad" outcome arises from poor implementations, resulting in wasted resources. Finally, the "ugly" outcome involves the misuse of technologies for malicious purposes. Numerous examples of the "bad" category exist, such as the overuse of word processing, which allows endless and unnecessary document editing, often resulting in little improvement despite numerous iterations. Similarly, the availability of powerful data analysis software can lead to redundant and excessive statistical tests, with minimal productivity growth as the outcome. These pitfalls also apply to AI unless users are adequately trained to harness its potential effectively.

Request complimentary report: https://www.capitaleconomics.com/key-issues/economic-impact-artificial-intelligence

Posted on September 22, 2023

DSML trend: OpenAI announced DALL-E.3

OpenAI has unveiled the third iteration of its AI art platform, DALL-E. Reviews have praised its significant improvements, with some areas of functionality surpassing that of Midjourney, particularly in terms of image sharpness. Notably, DALL-E 3 simplifies the creative process, eliminating the need for prompt engineering; even amateurs can now obtain desired images through straightforward prompts. Furthermore, this release seamlessly integrates with ChatGPT, streamlining the creative workflow. OpenAI has also placed a strong emphasis on robust safety measures to prevent the generation of inappropriate or harmful content, such as prohibiting the creation of images of public figures. However, DALL-E 3 is not expected to be publicly available until October.

That’s my take on it:

As for DALL-E's competitors, including Stable Diffusion and Midjourney, it is anticipated that their developers are tirelessly working to enhance their features. The ultimate winner in this competition may not become clear for another decade. To draw a parallel, we can reflect on the history of computing: Novell Netware dominated the network operating system landscape in the 1980s and early 1990s, but Microsoft's introduction of Windows NT Server in 1993 led to a gradual shift in market share, with Windows Server ultimately becoming the dominant server OS by the early 2000s, marking a decade-long competitive process. Similarly, Lotus 1-2-3 was launched in 1983 and quickly became the dominant spreadsheet software in the 1980s. In response, Microsoft introduced Excel for Mac in 1985 and the Windows version in 1987. It took approximately a decade for Excel to definitively overtake Lotus as the top-selling spreadsheet software by 1995. This pattern of a 10-year competitive evolution can also be observed in the cases of MS Word vs. WordPerfect and SAS/SPSS vs. BMDP.

Introduction to DALL-E.3 on YouTube: https://www.youtube.com/watch?v=27R0d1xUK9Y

Posted on September 15, 2023

DSML trend: Valuable data science certifications

Yesterday (September 14, 2023) Aileen Scott, a data scientist, released an article on Data Science Central titled "Are Data Science Certifications the Key to Unlocking Lucrative Opportunities?" In this short article, Aileen poses a question in her title and unequivocally answers it with a resounding "yes." According to her insights, while you can certainly pursue data science studies from the comfort of your home through online courses, opting for a certification program offers unique advantages by facilitating connections with fellow learners, instructors, and industry luminaries. The bottom line is: Earning a data science certification can significantly enhance your earning potential when compared with your non-certified peers. In Aileen’s view, the top choices of data science certification programs for 2023 are:

·      SAS Certified Data Scientist

·      Senior Data Scientist (SDSTM) by Data Science Council of America

·      Open Certified Data Scientist (Open CDS)

·      Microsoft Certified: Azure Data Scientist Associate

That’s my take on it:

While Aileen's provided list is concise, some may contend that it overlooks certain valuable programs, such as certifications for Amazon Cloud and Tableau. Nevertheless, Aileen's recommendations encompass two certification programs of a more generic or open-source nature, while the other two are product-specific or affiliated with particular companies (SAS and Microsoft). Although the open-source approach to data science and machine learning is gaining popularity, it is crucial to note that major corporations and tightly regulated industries continue to rely on proprietary software solutions due to their enhanced support and dependability. If you are in search of a data science training program, it is advisable to consider enrolling in one that provides both open-source and proprietary software tracks. In this regard, I encourage you to explore the data science program at Hawaii Pacific University. 

Posted on September 14, 2023

Today marks the second day of the 2023 Dreamforce conference, which is being hosted by Salesforce. It was a great experience even though I attended the conference remotely. Salesforce is widely recognized for its exceptional data visualization platform, Tableau, as well as its AI-driven analytical tool, Einstein. The central theme of this conference revolves around the concept of trust. Specifically, Salesforce is dedicated to constructing reliable systems that prioritize security, compliance, and dependability.

Throughout the conference, Salesforce has showcased its ability to guide users in creating more effective prompts through the innovative feature known as prompt tuning. Moreover, the event has featured numerous enlightening and captivating sessions. For instance, it has provided a platform for interviews with several distinguished AI leaders and innovators who have been acknowledged by TIME 100. Among these esteemed interviewees is Dr. Fei Fei Li. During her interview, Dr. Li openly expressed her wholehearted embrace of this transformative technology. While some individuals may be skeptical of this powerful yet unfamiliar technology, Dr. Li made a thought-provoking comparison. She pointed out that today, we are not overwhelmed by electricity, and we readily use medications like Tylenol despite not fully comprehending their chemical composition. Addressing concerns about AI bias, Dr. Li contended that AI can be harnessed to mitigate bias. As an example, AI can scrutinize instances where male actors receive more screen time than their female counterparts, highlighting disparities and providing an avenue for rectification.

That’s my take on it:

Critics have voiced concerns that AI tools might inadvertently encourage laziness and plagiarism. However, it is undeniable that AI is here to stay. The integration of AI into various industries is inevitable, and skills related to AI, such as prompt engineering, are increasingly being recognized as indispensable.

Salesforce, as the world's third-largest software company and the second largest in Japan, wields significant influence in this technological landscape. As high-tech companies like Salesforce incorporate prompt tuning into their product portfolios, it is foreseeable that in the near future the utilization of prompt engineering will become as ubiquitous as the use of smartphones and tablets.

Dr. Fei Fei Li's compelling metaphors, likening AI to electricity and Tylenol, underscore the notion that embracing transformative technologies is a natural progression of human innovation. This phenomenon is not dissimilar to the initial opposition encountered by calculators, which were once believed to diminish human numerical skills. Today, they are as commonplace as electricity and Tylenol, illustrating how society adapts and integrates new tools into everyday life.

Conference’s website: https://www.salesforce.com/dreamforce/

Posted on September 8, 2023

DSML trend: TIME’s top 100 most influential people in AI

On September 7 TIME announced the top 100 most influential persons in the field of AI, which includes Sam Altman of OpenAI, Dario and Daniela Amodei of Anthropic, Demis Hassabis of Google DeepMind, coinventor of the backpropagation algorithm Geoffrey Hinton, inventor of CNN Yann LeCun, co-founder and chief AGI scientist of Google DeepMind Shane Legg, co-founder and president of OpenAI Greg Brockman, co-founder and chief scientist of OpenAI Ilya Sutskever, co-founder of Schmidt Futures Eric Schmidt, science fiction writer Ted Chiang, co-founder of Nvidia Jensen Huang, Stanford professor Fei-Fei Li …etc.

That’s my take on it:

In TIME's top 100 list, Google boasts six individuals, while OpenAI is represented by five. In addition, Microsoft is notable with four entries, Intel with two, and Meta Facebook features one (Yann LeCun). There's also a presence from xAI (Elon Musk) and SalesForce/Tableau (Clara Shih), but curiously, Apple is entirely absent from the list. Surprisingly, there's no representation from Stability AI, the company behind the groundbreaking Stable Diffusion technology that generates artworks. No doubt Apple has lagged behind in AI development, with little noteworthy AI-related news to date. As of September 6, reports suggest that Apple is investing millions of dollars a day to train its own AI model Ajax, with claims that Ajax can surpass ChatGPT. However, concrete results are yet to be unveiled. I wonder how Apple's AI landscape might have been different had Steve Jobs still been alive today.

Time’s website: https://time.com/collection/time100-ai/

Apple’s story: https://www.theverge.com/2023/9/6/23861763/apple-ai-language-models-ajax-gpt-training-spending

Posted on September 6, 2023

DSML trend: Guardian blocks ChatGPT from accessing its content

On September 1 2023, the Guardian announced its decision to block access to its content for the AI text generation program, ChatGPT. In a statement, the publisher emphasized that the scraping of their intellectual property for commercial purposes has always been against their terms of service. They also highlighted their commitment to fostering mutually beneficial commercial relationships with developers worldwide. Other news media, including CNN, Reuters, the Washington Post, Bloomberg, and the New York Times implement similar policies. OpenAI, the owner of ChatGPT, had previously revealed an opt-out option for website owners who didn't want their content used by AI algorithms.

That’s my take on it:

Whether AI's utilization of existing published content constitutes copyright infringement or qualifies as fair use has been an ongoing debate. This same issue extends to AI art tools like Midjourney and Stable Diffusion. It is important to note that AI chatbots do not simply copy and paste content from the source. Rather, the nature and purpose of its use can be seen as transformative, meaning that AI repurposes copyrighted material in a novel and distinct manner. Similarly, AI art tools do not merely create collages; instead, they learn from patterns in existing artworks to generate entirely new images. Consider this analogy: If I extensively study art by browsing around a library and a museum, and subsequently, based on this knowledge I write a new article or create a new painting on my own, should the library or museum prevent me from accessing their information?

Full text:

https://www.theguardian.com/technology/2023/sep/01/the-guardian-blocks-chatgpt-owner-openai-from-trawling-its-content

Posted on September 1, 2023

DSML trend: China's Baidu ‘s AI chatbot Ernie Bot is publicly accessible

On August 31, 2023, Baidu, the Chinese search engine and AI company, made a significant move by unveiling "Erine Bot," their equivalent of the ChatGPT language model, to the public. As a result, Baidu's stock price surged more than 3%. This strategic move aligns with Beijing's vision of AI as a critical field, one where they aim to challenge the US and emerge as a global leader by 2030. By releasing Ernie Bot to the public, Baidu intends to gather extensive real-world user feedback. This feedback loop will, in turn, play a vital role in enhancing Ernie and strengthening Baidu's foundational models. Coincidentally, on the same day, two other prominent AI companies in China, Baichuan and Zhipu AI, also introduced their own AI language models.

That’s my take on it:

Back in 2017, Russian President Putin emphasized the transformative potential of AI by saying “whoever becomes the leader in this sphere will become the ruler of the world.” This perspective reflects the ongoing international competition among major technological powers, including the US and China, to gain supremacy in AI research and development.

Unfortunately, I faced challenges registering with Baidu's AI, as it requires a China’s cell phone number for access. Consequently, I was unable to evaluate Ernie Bot personally. However, those who did manage to access Baidu's AI encountered significant restrictions, particularly in its reluctance to answer sensitive political and historical inquiries. If you had the opportunity to assess Ernie Bot, I would greatly appreciate it if you could share your insights and findings with me. If you found a way to circumvent the requirement of providing a China’s cell phone information when registering for Ernie Bot, please let me know too.

Full text:

https://techxplore.com/news/2023-08-china-baidu-ai-chatbot-ernie.html?utm_source=ONTRAPORT-email-broadcast&utm_medium=ONTRAPORT-email-broadcast&utm_term=Newsletter&utm_content=Data+Science+Insider%3A+September+1st%2C+2023&utm_campaign=02092023

Ernie Bot’s website:

https://yiyan.baidu.com/welcome

Posted on September 1, 2023

DSML trend: Nvidia is thriving in the AI boom at the expense of Intel and AMD

According to a report posted by Nikkei Asia today, Nvidia is thriving in the AI boom while Intel and AMD struggle to keep up. Nvidia, the GPU market leader, has seen its stock price triple since the beginning of the year. Its shares jumped over 6% in one week after reporting 101% year-over-year revenue growth on August 23rd. Nvidia racked up $13.51 billion in revenue last quarter, more than double the previous year's figure, largely driven by its data center business and AI chips like the H100 and A100. Meanwhile, AMD and Intel's share prices have dropped 7.41% and 4.08% respectively in the past month. Nvidia's data center revenue was nearly double the data center chip sales of Intel and AMD combined last quarter. It is a clear trend that Intel and AMD are increasingly vulnerable to losing market share in the traditional x86 CPU market.

That’s my take on it:

The above phenomenon highlights the limitations of Intel's x86 architecture for the demand of computing power in the era of big data and AI. While powerful for sequential tasks, x86's complexity makes it less optimized for massively parallel processing compared to GPU architectures. For AI/machine learning and other data-intensive applications, GPUs can provide 10-100x higher throughput. In addition, GPUs have very high memory bandwidth optimized for throughput, allowing fast access to large datasets while CPUs have lower bandwidth. Sadly, Intel was reluctant to modify its architecture and slow to get into the GPU market. Although I don’t think x86-based CPUs will disappear overnight, the future is undoubtedly trending toward GPU-accelerated computing. As GPU computing gains momentum, data scientists, educators, and DSML students need to adapt to this paradigm shift.

P.S.: If I had bought Nvidia stock 10 years ago, I would be retired and sending this message on a cruise ship or at the beach right now.

Full text (may require subscription):

https://asia.nikkei.com/Business/Tech/Semiconductors/Nvidia-s-boom-gives-TSMC-a-lift-but-does-less-for-other-chipmakers?utm_campaign=GL_asia_daily&utm_medium=email&utm_source=NA_newsletter&utm_content=article_link&del_type=1&pub_date=20230901123000&seq_num=3&si=14108111

Posted on August 30, 2023

DSML Trend: Revival of OpenAI?

According to reports posted by “The Information” and Reuters on August 29, 2023, OpenAI is poised to achieve over $1 billion in revenue within the upcoming year through the sale of AI software and the corresponding computational capacity that drives it. Previously, the creators of ChatGPT had estimated revenue of $200 million for the current year. Notably, the company, backed by Microsoft, is now amassing a staggering revenue surpassing $80 million each month, a significant escalation from the mere $28 million garnered throughout the entirety of the preceding year.

That’s my take on it:

Two weeks ago, IT experts predicted that OpenAI might go bankrupt by the end of 2024 due to a decline in usage. Suddenly this situation has undergone a surprising reversal. In my opinion, the future trajectory of OpenAI remains uncertain, because the fate of the company relies on a single product. Prior to ChatGPT, OpenAI boasted another flagship product known as DALL.E2, a creative tool for generating visual art. However, the market of generative art has now been predominantly seized by Midjourney, which boasts a user base of 15 million, the largest among all image generation platforms. In terms of overall image production volume, Stable Diffusion takes the lead with an impressive 12.59 billion images generated.

The question arises: should OpenAI reallocate its R&D resources to the more promising ChatGPT and relinquish DALL.E2, or should it engage in a dual-front battle? This is an intricate puzzle that demands careful consideration.

Reuters’s report: https://www.reuters.com/technology/openai-track-generate-more-than-1-bln-revenue-over-12-months-information-2023-08-29/#:~:text=Aug%2029%20(Reuters)%20%2D%20OpenAI,in%20revenue%20for%20this%20year.

Statistics of generative art tools: https://journal.everypixel.com/ai-image-statistics

Posted on August 26, 2023

DSML trend: G2 Grid for Data Science and Machine Learning Platforms

On August 24, 2023, G2 released the G2 Grid for Data Science and Machine Learning Platforms. To be considered for inclusion in this DSML benchmarking, the candidate must adhere to the following criteria:

1.     Facilitate the connection of data to algorithms, enabling them to acquire and adapt knowledge.

2.     Enable users to construct machine learning algorithms and/or furnish pre-built machine learning algorithms suitable for less experienced users.

3.     Furnish a platform for the widespread deployment of artificial intelligence.

G2 classified DSML companies into four distinct quadrants, namely, leaders, high performers, contenders, and niche, utilizing a dual-dimensional framework: market presence and customer satisfaction. According to G2 scoring, currently the leaders of DSML are:

·      Databricks Lakehouse

·      IBM Watson Studio

·      Matlab

·      Alteryx

·      Vertex AI

·      SAS Visual Data Mining and Machine Learning

·      Anaconda

·      Saturn Cloud

·      Microsoft Azure Machine Learning

·      Deepnote

·      Amazon SageMaker and AWS Trainium

·      TensorFlow

·      Qlik AutoML

That’s my take on it:

The preceding list includes well-established companies like SAS, IBM, and Microsoft, alongside newcomers challenging the existing order. I admit that I do not possess the skill sets required for all of the software tools mentioned. Coping with the rapid evolution of technologies poses a considerable challenge for university professors, particularly in fields where progress is frequent. In my opinion, transitioning the emphasis from instructing specific skills to nurturing the capacity for perpetual learning is undeniably a valuable approach. To remain current, one effective tactic involves inviting guest speakers from industry or research domains to share their expertise and insights with students. This exposure acquaints students with real-world applications and prevailing industry methodologies. Moreover, it is imperative for faculty to motivate students to cultivate a mindset characterized by openness to change and a willingness to experiment. By the time my students graduate, G2, Gartner, Forrester, and IDC may compile a new list of DSML leaders!

Full report: https://www.g2.com/categories/data-science-and-machine-learning-platforms

Posted on August 26, 2023

In a recent piece published on KDnuggets (August 24, 2023), Dr. Mandar Karhade speculated the architecture of GPT-4 based upon leaked information. The author posited that rather than being a singular colossal model, GPT-4 might consist of eight separate models, each bearing 220 billion parameters. This novel approach involves breaking down a given task into smaller subtasks, which are then tackled by specialized experts within the context of these models. The strategy mirrors a divide-and-conquer methodology. Subsequently, a gating model is introduced to determine the optimal expert for each subtask, culminating in the final prediction. However, the author included a disclaimer emphasizing the non-official nature of this information.

That’s my take on it:

At the present time, this notion remains an unverified rumor. Nevertheless, the idea holds a certain degree of credibility. The underlying concept closely resembles, if not mirrors, the principles of ensemble methods and model comparison, a common practice in the realm of Data Science and Machine Learning. In ensemble methods such as boosting and bagging, numerous modeling procedures are executed on partitioned subsets of data. Subsequent model comparison is conducted to select the most optimal solution derived from an array of modeling techniques: neural networks, SVM, bagging, boosting, among others. Hence, the synthesis of eight models in GPT-4 represents a natural progression akin to ensemble methods and model comparison, taking the idea a step further.

Full article: https://www.kdnuggets.com/2023/08/gpt4-8-models-one-secret.html

Posted on August 15, 2023

DSML trend: Europeans collaborate with China’s Huawei in AI-based weather forecasting

The European Centre for Medium-Range Weather Forecasts (ECMWF) has entered into a partnership with Huawei, a leading Chinese technology company, to launch an artificial intelligence-powered weather forecasting system. This collaboration aims to combine ECMWF's expertise in meteorology with Huawei's advanced AI capabilities. The new model, Pangu-Weather, was developed by Huawei and has demonstrated superior accuracy over traditional models. ECMWF selected the Pangu model after rigorous comparative testing showed it consistently outperformed other models, including in predicting extreme weather events. A recent Nature journal article provides further validation of the Pangu model's capabilities, highlighting its ability to achieve highly precise forecasts at speeds up to 10,000 times faster than legacy weather models.

That’s my take on it: Current U.S. export restrictions prohibit Nvidia from exporting certain high-performance AI chips like the A100 and H100 GPUs to China. A recent executive order also restricts U.S. investment into key Chinese technology sectors including semiconductors, AI and quantum computing. In anticipation of further export restrictions, major Chinese technology companies have been urgently placing large orders for high-performance Nvidia chips, with recent estimates valuing these bids at around $5 billion. It remains unclear whether Huawei will be able to fully capitalize on ECMWF's capabilities given these limitations on accessing critical U.S. technologies. Nonetheless, developing highly accurate weather forecasting is increasingly crucial as extreme weather events become more frequent, likely due to climate change.

Full text:

https://opentools.ai/news/weather-forecasting-using-ai-europe-and-china-take-the-lead/

https://www.neowin.net/news/chinese-internet-giants-go-on-a-5-billion-nvidia-shopping-spree-to-power-their-ai-ambitions/

Posted on August 14, 2023

DSML trend: IBM will integrate Meta’s Llama into Watson

On August 9, IBM announced plans to host Meta's 70 billion parameter Llama 2 large language model on its Watson AI and data science platforms. Currently in watsonx.ai, users can leverage pre-trained models from IBM and Hugging Face for Natural Language Processing tasks, such as content generation and summarization, as well as text classification and extraction (text mining). The future addition of Llama 2 to watsonx.ai will be a milestone for IBM's generative AI roadmap, likely followed by upcoming releases of its AI Tuning Studio.

That’s my take on it: IBM's flagship data science products are Watson Studio and SPSS Modeler. For a long time, IBM has trailed its top competitor SAS Institute in user base, interface, and capabilities. Nevertheless, IBM has invested in AI research and development since the 1950s. In 1997, IBM's Deep Blue beat the world chess champion in a six-game match. In 2011, IBM's Watson competed and won against top human Jeopardy! contestants. Although Meta's Llama is less powerful than models like Claude 2, Google Bard, and ChatGPT, incorporating a large language model into IBM products is still strategic. However, it's too early to tell whether IBM can overtake SAS in the near future.

Full announcement: https://newsroom.ibm.com/2023-08-09-IBM-Plans-to-Make-Llama-2-Available-within-its-Watsonx-AI-and-Data-Platform

Posted on August 14, 2023

DSML trend: OpenAI faces financial challenges and the rise of Claude

 

An article in yesterday's Business Today (August 13) reported that OpenAI, the pioneering AI company that brought ChatGPT to the mainstream public, is facing financial challenges. The costs to operate ChatGPT amount to around $700,000 per day. Despite efforts to monetize GPT-3.5 and GPT-4, OpenAI has yet to earn sufficient revenue to cover its expenses. According to SimilarWeb data., ChatGPT's user base declined 12% from June to July 2023, dropping from 1.7 billion to 1.5 billion monthly users.

 

That’s my take on it: Researchers at Stanford and UC Berkeley systematically evaluated different versions of ChatGPT. It was found that in math tests, ChatGPT solved 488 out of 500 correctly in March (97.6% accuracy). By June, its accuracy dropped 2.4%. ChatGPT's global website traffic fell, especially after the launch of Claude 2. Claude 2 scored 71.2% on a Python coding test versus ChatGPT's 67%. Claude is also more updated, with an early 2023 cutoff versus September 2021 for ChatGPT. While it's premature to declare the end of ChatGPT, the future landscape of large language models is volatile as more competitors enter the market.

 

Full article: https://www.businesstoday.in/technology/news/story/openai-might-go-bankrupt-by-end-of-2024-chatgpt-costing-over-rs-580-crore-per-day-report-393925-2023-08-12

Posted on August 14, 2023

In an essay published on August 9th, Andrew Ng, a co-founder of Google Brain and former Chief Scientist at Baidu, posited that Othello-GPT, a powerful large language model, demonstrates a noteworthy degree of world comprehension. To be specific, during its training phase involving gameplay, the neural network exclusively processed sequences of moves, but it was not explicitly provided with information indicating that these sequences pertained to actions on an 8x8 board or the game's rules. However, after extensive training on a substantial dataset of such moves, the model exhibited proficiency in predicting subsequent moves in an 8X8 board. By the same token, certain large language models trained in English have displayed an ability to "learn" additional languages, enabling them to comprehend and follow instructions in these languages. This observation has led both Andrew Ng and Geoff Hinton to draw the consensus that large language models really possess a form of world understanding.

 

That’s my take on it: Comprehension, or understanding, in the realm of psychology, goes beyond the mere perception of sensory input. Rather, it involves active engagement with information, its connection to pre-existing knowledge and personal experiences, and the construction of a coherent mental representation or interpretation. However,  even if they fulfill the aforementioned criteria, do LLMs simply appear to or behave as they understand the world?

 

Full text: https://www.deeplearning.ai/the-batch/issue-209/?utm_source=tldrai

Posted on August 1, 2023

Center for Consciousness Studies at the University of Arizona and California for Human Science will cohost a conference entitled “Neuroscience needs a revolution to understand consciousness” between August 18-23, 2023. One of the keynote speakers is Sir Roger Penrose, a British mathematician, physicist, philosopher of science, and Nobel Laureate in Physics. The following is a brief introduction to the theme of the conference.

“'AI has reinforced the notion of the brain as a complex computer of simple, empty, ‘cartoon’ neurons based on 1950s physiology, processing solely by surface membranes, synaptic transmissions and firings as “bit-like' units in frequencies up to 100 hertz…The Penrose-Hameroff ‘Orch OR’ theory proposes consciousness depends on ‘orchestrated’ (‘Orch’) quantum superpositions leading to Penrose ‘objective reductions’ (‘OR’, wavefunction self-collapses) in brain microtubules, connecting to fundamental spacetime geometry. Orch OR has more explanatory power, connection to biology, and experimental validation than all ‘neuroscientific’ theories based on low frequency, oversimplified cartoon neurons combined… Neuroscience needs a revolution inward, to deeper, faster quantum processes in microtubules to understand consciousness and treat its disorders.”

That’s my take on it:

Many experts speculate that AI may eventually attain self-consciousness, potentially posing a threat to humanity. The concept of consciousness raises several fundamental questions: What is consciousness? How can we ascertain whether an AI system is genuinely self-conscious? Do you have to fully understand consciousness in order to know whether a machine is self-aware? The widely-used Turing test, considered behavioristic, is deemed unreliable for this purpose.

During the 1980s and 1990s, Roger Penrose expounded on the notion of human consciousness in his books "Emperor's New Mind" and "Shadows of the Mind." He argued that consciousness involves non-algorithmic processes that defy computational reduction. Penrose also criticized the concept of Strong AI, which contends that machines can achieve human-like consciousness. He posited that human attributes such as creativity, insight, and mathematical intuition are beyond the reach of artificial systems due to their dependence on non-computable processes. I have registered for the conference (online only: $75). I look forward to hearing updates of Penrose’s arguments during the event.

Registration: https://www.eventbrite.com/e/neuroscience-needs-a-revolution-to-understand-consciousness-tickets-667018659767?aff=oddtdtcreator

Posted on July 21, 2023

DSML Trend: New role of data scientists by embracing economic thinking

In response to the burgeoning influence of generative AI (GenAI), Bill Schmarzo has authored an insightful article titled "Next-Gen Data Scientist: Thinking Like an Economist" on Data Science Central. This article explores the parallels between economic principles and data science methodologies, underscoring the criticality of considering trade-offs, incentives, and resource allocation in data-driven decision-making processes. As outlined in a recent report by McKinsey, GenAI is projected to potentially automate up to 40% of the tasks currently executed by data science teams by 2025. These tasks, including data preprocessing, coding, and hyperparameter tuning, can be more effectively and efficiently accomplished through AI assistance. Nevertheless, it is crucial to recognize that GenAI lacks significant domain knowledge, setting it apart from human experts. By embracing an economic mindset, data scientists can optimize their strategies, thoughtfully prioritize projects based on potential returns, and skillfully communicate insights to stakeholders, thus providing robust decision support.

That’s my take on it:

For a long time, I have advocated against an excessive focus on data wrangling and programming within DSML (Data Science and Machine Learning) education. First, if the data collection protocol and data architecture are well-designed, there is no need to waste our time on data cleaning and data conversion. Second, complicated coding can, to some extent, hinder the discovery of insightful knowledge. Looking ahead, as AI progressively assumes responsibility for more low-level tasks, data scientists should concentrate their efforts on analytics and interpreting the implications of results for end-users.

Full article:

https://www.datasciencecentral.com/next-gen-data-scientist-thinking-like-an-economist/?utm_source=ONTRAPORT-email-broadcast&utm_medium=ONTRAPORT-email-broadcast&utm_term=Newsletter&utm_content=Data+Science+Insider%3A+June+21st%2C+2023&utm_campaign=22072023

Posted on July 21, 2023

DSML trend: Generative AI fails to spark a strong demand for microchips

Today (July 21) Taiwan Semiconductor Manufacturing Co (TSMC) reported a sharp 23% decline in Q2 earnings, indicating that the recovery in the global semiconductor market is happening at a slower pace than anticipated. TSMC now projects a 10% revenue contraction for the full year 2023, reversing its previous forecast of slight growth. Its peer companies like Samsung and Micron have also posted earnings declines, further signaling weakness in the industry. Apparently, generative AI fails to spark a strong demand for microchips. Many generative AI services are delivered via servers equipped with NIVIDA GPUs. To sustain growth in these services, expansion of data centers is expected. However, generative AI's impact will take time due to need for advanced chip packaging. It is predicted that AI will not lead to a full-scale recovery in demand for semiconductors until 2024.

That’s my take on it:

Even though generative AI is more technologically advanced than the Internet, why hasn't generative AI created an economic boom similar to the Internet revolution of the 1990s? As far as I know, currently generative AI is still experimental. Unlike e-commerce, which had a straight-forward way to make money by selling products or services online, generative AI does not yet have proven business models. Companies are still figuring out how to commercialize the technology. Further, while generative AI shows promise for some business uses, it currently has limitations in understanding context and executing practical tasks. Put it bluntly, it is fun to chat with ChatGPT, Google Bard, and Claude 2, but information provided by these large language models is not 100% accurate, and it seems that widespread enterprise adoption will take more time. Nevertheless, I believe that it will happen soon! Those who are unprepared will be left behind.

Full article (subscription required):

https://asia.nikkei.com/Business/Tech/Semiconductors/TSMC-s-woes-signal-global-chip-slump-extending-throughout-year?utm_campaign=GL_asia_daily&utm_medium=email&utm_source=NA_newsletter&utm_content=article_link&del_type=1&pub_date=20230721123000&seq_num=9&si=14108111

Posted on July 16, 2023

DSML Trend: Elon Musk’s view on xAI and superintelligence on Twitter

Last Friday (July 14) Elon Musk held a Twitter Spaces conversation to discuss his new AI company called xAI. A total of 40,000 people attended the event. The xAI researchers were recruited from OpenAI, Google DeepMind, and the University of Toronto. According to Musk, the goal of xAI is to create AI systems that are highly intelligent, curious, and truthful. Musk wants xAI to study questions about physics like dark matter, as well as why there is little evidence of aliens despite the age of the universe. He believes these are math problems that can be solved with powerful AI. Interestingly, Musk points out that today even the most powerful neural networks cannot produce a novel on a par with human writers. He asserts that current AI research counting on the brute force of computing is missing something. Based on the lessons learned in Tesla, Musk argued that researchers might overcomplicate the problem and the solution might be much simpler than we thought. xAI intends to release a product to the public soon, likely a chatbot trained on Twitter conversations. Musk wants the xAI chatbot to say what it truly thinks, without politically correct guardrails. More details on xAI's first product will be provided in two weeks.

That’s my take on it: 

AI is a highly competitive field. For years, Google, Microsoft, Apple, Meta, and other key players have invested billions of dollars in AI research. During the Twitter Spaces interview, Musk admitted that xAI would take some time to catch up with OpenAI and Google. Nonetheless, given his success in Tesla and SpaceX, it is conceivable that xAI could introduce a new large language model based on a new paradigm in the near future. Although Musk didn’t disclose the details of lessons learned in Tesla, I guess xAI’s approach will simplify existing neural networks, like Reduced Instruction Set Computer (RISC) is designed to simplify Complex Instruction Set Computing (CICS). 

The audio file of the interview on Youtube:

https://www.youtube.com/watch?v=Xd8fNiySYLQ

Text-based summary:

https://www.fastcompany.com/90924249/elon-musk-xai-aliens

Posted on July 15, 2023

The following YouTube video presents a concise comparison of Claude 2 and ChatGPT. According to the YouTuber, “Claude 2 presents a significant leap in AI technology with unique abilities like summarizing up to 75,000 words and achieving impressive scores in diverse tests, outperforming its predecessor and its competitor, ChatGPT from OpenAI. Claude 2 not only offers advanced functionality, but also prioritizes safety, striving to minimize harmful or offensive content, and affordability, undercutting ChatGPT's API cost significantly.”

https://www.youtube.com/watch?v=nuUxG-C7iMc

  

I asked Claude 2 to compare itself against ChatGPT. The following is the answer from Claude:

“Limitations:

·      Claude has more limited conversational abilities compared to ChatGPT, which was explicitly trained for dialogue.

·      ChatGPT tends to be more verbose, while Claude gives shorter, more focused answers.

·      Claude has significantly less training data than ChatGPT, constraining its knowledge breadth, though Claude aims to mitigate this through efficiency.

·      Both may occasionally generate biased, unethical, or implausible responses, requiring caution and human evaluation of outputs.

·      They lack a consistent personality or worldview, since they don't have real experiences.

In summary, Claude and ChatGPT have common capabilities but ChatGPT is more conversant, while Claude is more concise and targeted, reflecting their different underlying architectures and training. Both have limitations inherent to current AI.”

Posted on July 15, 2023

A few days ago (July 11), Anthropic, an AI initiative based in the US, launched its powerful chatbot Claude 2. It is claimed that this chatbot is the closest competitor to OpenAI's ChatGPT. Aside from text-based prompts, Claude also accepts CSV data sets, PDFs, and other types of documents for analysis. Claude is also good at taking high-stake exams. The most recent version of Claude achieved a score of 76.5 on the multiple-choice section of the Bar exam. On the GRE reading and writing sections, Claude 2 scores above the 90th percentile, and on the quantitative reasoning section, it scores on par to the median applicant.

The following are my test results:

I uploaded a PDF about alleged data fraud discovered in research articles written by Professor Francesca Gino and Professor Dan Ariely. I then asked Claude to summarize the document. The summary is excellent because it covered all important aspects in a clear and concise manner.

The CSV data set I uploaded to Claude was tricky. A number of variables have extremely skewed distributions, which necessitates data transformation. However, in this AI system data analysis was performed only with raw, untransformed data. In addition, no visualization of the data is provided. At the present moment, Claude does not appear to be a reliable data analytics tool.

I input the following three questions to both Claude and ChatGPT:

1.     What are the differences between item response theory and Rasch modeling?

My comments: I find both answers to be accurate and fairly comprehensive. However, both failed to discuss the guessing parameter (g) and data-model fitting order.

2.     What are the limitations of differential item functioning?

My comments: Again, both are accurate and fairly comprehensive, but neither one discussed the differences between non-IRT DIF and IRT-based DIF.

If students use Claude or ChatGPT to answer the above exam questions, at most they can earn a “B+” only.

3.     Write a SAS program for DBSCAN.

My comments: Claude and ChatGPT used two different approaches to solve the problem. The former employed SAS macros programming whereas the latter utilized SAS’s interactive matrix language (IML). No doubt my coding time can be reduced from hours to minutes with this tool.

Sign up for Claude AI (US and UK only):

https://claude.ai/login

Posted on July 14, 2023

Recently the California State Board of Education has approved significant changes to the K-12 math curriculum by integrating data science and emphasizing real-world applications. In response to the growing importance of data science in society as well as the need to prepare students for careers requiring strong data analytics and problem-solving skills, in the past two years the board has approved data science courses in many high schools. However, the University of California faculty committees that oversee high school courses accepted for admission to UC argued that Algebra 2 should not be replaced with data science, because this will under-prepare students who plan to major in STEM.

That’s my take on it: 

In essence, the debate boils down to the purpose of education. As the name implies, data science is more empirical and data-driven, whereas theoretical mathematics is more logical and model-based. To equip students with job skills sought by the market or to solve real-life problems, it seems that data science is preferable to theoretical mathematics. However, advanced math is also necessary for developing abstract reasoning and symbolic processing. My question is: why can't they keep both? 

Articles:

https://edsource.org/2023/uc-committee-changes-admission-standard-for-data-science-causing-confusion-over-math-framework/693892

https://www.edweek.org/teaching-learning/california-adopts-controversial-new-math-framework-heres-whats-in-it/2023/07

Posted on July 8, 2023

On July 5, 2023 Open AI announced a new initiative called “Superalignment” that aims to resolve the alignment problem. According to Jan Leike and Ilya Sutskeer, the Chief Scientist and the Head of the Superalignment team, although superintelligence will be the most impactful technology that could help us solve many important problems, the vast power of superintelligence could also threaten humanity or even result in human extinction. In response, superaligment is introduced as a proactive process of ensuring that superintelligent AI will follow human intent. Their approach is to build a human-level automated alignment researcher to validate the resulting model and to spot problematic behaviors of an AI system.

That’s my take on it: 

If you are not familiar with the alignment problem, “The Alignment Problem: Machine Learning and Human Values” (Christian, 2020) is an accessible introduction. This book discusses the ethical and psychological challenges when the goals of AI systems and human values are misaligned. When we instruct the AI system to complete a specific task, the system may attempt to achieve the goal at all costs and by any means, but the method may not be aligned with human interests and values. For example, if we ask AI to eliminate spam emails, it might delete all email accounts in order to attack the root cause of the problem. If a factor owner instructs the AI system to produce paper clips using the most cost-effective way, all metals may be redeployed by the AI system to the paper clip factory, which would offset other priorities. The scenarios presented here are very simplistic. Unlike conventional computers that require pre-programming, machine learning is self-evolving. As AI becomes more advanced, its behaviors might become more unpredictable, and the consequences may far exceed our predictions. Can superalignment resolve or at least alleviate the alignment problem? It is very difficult, if not impossible, to predict the unpredictable.  

OpenAI announcement:

https://openai.com/blog/introducing-superalignment

Challenges and Criticisms of OpenAI's Approach to AGI:

https://www.youtube.com/watch?v=CkOolFpHWM8

Posted on July 7, 2023

The International Telecommunication Union's (ITU) annual AI for Good Summit, which was held on July 6 and 7, 2023, aims to harness the power of AI to address global challenges and promote sustainable development. The conference brought together experts from various fields, including healthcare, climate change, and education experts, to discuss and explore AI applications. ITU is a Geneva-based United Nations agency that represents all 193 member states as well as over 900 companies, universities, and other organizations. And therefore, the AI for Good Summit is a truly global conference.

That’s my take on it:

Despite its representativeness, there is no formal declaration, negotiated statement, or decision announced by ITU. Although the discussion in the Summit has led to the creation of focus groups for developing new standards, as well as addressing the impact of AI-enabled androids on humans, it is very difficult, if not impossible, for rival countries that embrace different political ideologies and ethical standards to reach a consensus. I read the closing statement of the summit. Frankly speaking, it is very general and vague.

ITU Statement On The Closing Of The 2023 AI For Good Global Summit:

https://www.publicnow.com/view/8ABBE69C860E65230F0AB9858CFA4696EBE691FA?1688759234

ITU AI For Good Global Summit 2023 Press conference:

https://www.youtube.com/watch?v=m9IN14e-PLk  

Posted on July 7, 2023

The World Artificial Intelligence Conference (WAIC) is currently being held in Shanghai, China. In a keynote speech at the conference, China's Vice-Minister for Industry and Information Technology, Xu Xiaolan, said the country plans to develop a complete AI value chain, covering chips and algorithms to large language models (LLMs). In addition, the Ministry of Industry and Information Technology of China announced the government will fully support 360, Baidu, Huawei, and Alibaba in R&D of AI by actively promoting the development of a national standard system.

That’s my take on it:

There is no doubt that China's AI development faces an uphill battle, since the US limits the export of cutting-edge technology to China, including top GPU models from AMD and Nvidia. Nonetheless, there are other channels for Chinese scientists and engineers to gather crucial information for AI development, such as borrowing open-source codes and collaborating with companies that are friendly to China. For example, Elon Musk, the founder of Tesla and SpaceX, is opposed to decoupling between the US and China. At WAIC Musk said, “China is going to be great at anything it puts its mind into. That includes…artificial intelligence.” It is likely that Chinese AI scientists and engineers will learn from Tesla.  

English text:

https://www.scmp.com/tech/big-tech/article/3226808/china-asserts-ai-development-goals-shanghai-conference-beijing-braces-potential-new-us-tech

Chinese text:

https://www.worldjournal.com/wj/story/121474/7285133?from=wj_breaknews_index

Posted on July 6, 2023

Today, Tesla showcased their AI products at the World Artificial Intelligence Conference in Shanghai, China, along with 400+ exhibitors. In addition to its Autopilot (Fully Self-Driving) cars, Tesla displayed a prototype of its Optimus robot. The Optimus has the latest technology of the same origin as Tesla vehicles, including a fully self-navigation computer and a Tesla Vision visual neural network. The Tesla humanoid robot is 172 centimeters tall and weighs 56.6 kilograms, which is no different from a normal adult. Like human joints, the robot's whole body has 28 degrees of freedom. Its hand has 11 degrees of freedom, and therefore it has a high degree of flexibility and dexterity. As a result of its powerful motors, the robot is capable of lifting a piano with just one hand. This humanoid robot can also walk, climb stairs, squat, and pick up objects, and it already has the capability to protect itself and other people. In the future, robots may cook, mow the lawn, care for the elderly, or replace humans in dangerous and boring factory jobs.

That’s my take on it:

According to some commentators, Tesla's robots are ahead of Boston Dynamics because Boston Dynamics' robots require preprogramming for movement, while Tesla's can evolve through machine learning. I know what's on your mind. Could a self-learning robot harm humans at some point if it becomes out of control? People may even wonder whether the Optimus will be weaponized since it can lift a piano with one hand and defend itself. Is it going to be used for evil purposes? Regulations should be discussed as early as possible.

P.S.: I want the Optimus if I can afford one. Currently I am moving from LA to Honolulu. I need a robot that can lift heavy objects for me!

English text:

https://www.teslaoracle.com/2023/07/06/tesla-cars-optimus-robot-world-ai-conference-shanghai-elon-musk-ai-chinese-cars/

Chinese text:

https://www.chinatimes.com/realtimenews/20230706005385-260412?utm_source=likr&utm_medium=web_notification&utm_content=VOY0&avivid_manual=4

Posted on June 29, 2023

Last week (June 20) Microsoft AI researchers published a paper entitled “Textbooks are all you need." In this paper they introduce a “small” large language model called PHI-1 with only 1.3 billion parameters, which is significantly smaller than GPT4 (170 trillion parameters). After being trained in four days on a system with eight Nvidia’s A100 GPUs based on a set of ``textbook quality" data from the Internet, PHI-1 is able to achieve 50.6% on HumanEval, a metric for measuring functional correctness for synthesizing programs. When the number of parameters is reduced to 350 million, it still achieves 45%.

Full paper: https://arxiv.org/abs/2306.11644

Summary: https://www.youtube.com/watch?v=H_bLpa9oAJ8

That’s my take on it:

AI and big data are symbiotic. As AI ethicist Juile Mehan said, “AI is useless without data, and mastering today’s ever-increasing amount of data is insurmountable without AI.” In light of this reasoning, a new Moore's law appears in the sense that large language models are getting bigger and bigger over time. However, more parameters and more data are not necessarily better; rather, data quality also matters. Back in 1974 Blalock wrote, “The more errors that creep into the data collection stage, the more complex our analyses must be in order to make allowances for these errors.” This statement is true in both traditional statistics and DSML. Less is more! The Microsoft approach may be a game changer! 

Posted on June 28, 2023

In a $1.3 billion deal announced two days ago (June 26), Databricks, an industry leader in data management, will acquire MosaicML, a generative AI platform that empowers enterprises to build their own AI. According to Databricks, the rationale of this acquisition is: “Today, virtually every organization is exploring how best to use generative AI and LLMs, and every leader is considering how they leverage these new innovations while retaining control of their most precious resource: their data.”  

That’s my take on it:

The technology industry is undergoing a wave of AI acquisitions. In early May Databricks acquired Okera, a data governance platform with a focus on AI. In late May Snowflakes acquired Neeva, an AI-enabled search engine that could enhance its cloud data management capabilities. Aside from acquisitions, forming partnerships is another common AI strategy. Yesterday (June 27) at Snowflake Summit 2023 SAS announced that SAS Viya’s AI-based decision-support capabilities have been incorporated into the Snowflake Data Cloud with Snowpark Container Services. Needless to say, those who failed to catch the wave and operated in silos may eventually lose out to more powerful competitors. Hence, I believe it is imperative to teach students (our future workforce) how to integrate various tools, or at least understand the "big picture." 

Full articles:

https://www.databricks.com/company/newsroom/press-releases/databricks-signs-definitive-agreement-acquire-mosaicml-leading-generative-ai-platform

https://finance.yahoo.com/news/sas-runs-sas-viya-ai-191700223.html

https://techcrunch.com/2023/05/24/snowflake-acquires-neeva-to-bring-intelligent-search-to-its-cloud-data-management-solution/

https://techcrunch.com/2023/05/03/databricks-acquires-ai-centric-data-governance-platform-okera/

Posted on June 27, 2023

In response to AI bias, Stability AI, the London-based company that created Stable Diffusion, is working on generative AI tailored for Japanese users. Like Midjourney and DALLE-2, Stable Diffusion allows users to create

photorealistic artworks using natural language. The problem is that this type of machine learning system is constantly fed by English or Western data sources. In most cases, if a user requests a picture of a house, a woman,

or a man, it is likely that they will receive an image of a Western house or a White person. In order to address the issue, Stability AI has planned to release an experimental Japanese-language AI tool that was based on localized data.

Full story (subscription required): https://asia.nikkei.com/Business/Technology/OpenAI-rival-says-AI-tech-needs-more-Asia-localization

I grew up in Hong Kong. During my childhood, I watched many Japanese sci-fi TV programs and films, including Masked Rider(幪面超人), Ultraman(鹹蛋超人), and Japanese Iron Man(鐵甲萬能俠). In those science fiction stories,

all of the heroes that save humanity are Japanese mutants, Japanese-like aliens, or Japanese-made robots. Interestingly, all space aliens in those shows speak Japanese! Nevertheless, this type of presentation does not strike me

as biased. For local artists and content creators, making things based on their experience is natural and rational. I am not downplaying Stability AI's good intentions and efforts. True! If the user enters words such as "house"

or "people", the AI system may default to generating images of Western houses or Caucasian people. However, if I specify a Japanese house or a Japanese woman in the prompt, it will display exactly what I request. 

Posted on June 16, 2023

In a recent benchmark study, the Futurum Group compared SAS Viya and several open source software packages, such as Apache SparkML, H2O, and Ranger, in terms of scalability and performance. In this study, random forest, gradient boosting, including LightGBM and XGBoost, linear regression, and logistic regression were rigorously tested on big data. It was found that for running machine learning with high-dimensional data, SAS Viya is on average 30 times faster than all other competitors across 1,500 tests. Specifically, SAS Viya solution delivered results in under 12 minutes on a dataset containing over 300 million data points, while SparkML and another rival failed to deliver results after running for hours. For running traditional procedures, such as linear regression and logistic regression, SAS Viya ran faster in 49 out of the 50 tested configurations.

Full report:

https://futurumgroup.com/wp-content/uploads/2023/06/Lab-Insight_Performance-at-Scale_Comparing-AI-ML-Performance-of-SAS-Viya-vs-Alternatives.pdf

That’s my take on it:

Having used both open source and proprietary software applications, I do not believe that we should side with one camp or the other. Open source is touted as a great tool, but in my view its advantages are overstated. I am not surprised by this benchmark result. While developers of open source are a loose conglomerate, resulting in incompatibility and redundancy, R&D in commercial corporations, such as SAS, IBM, and Microsoft, are coordinated and thus coherent. As a matter of fact, a lack of financial incentives makes it difficult for volunteers to devote substantial time and effort to optimizing machine learning codes. OpenAI has taken the world by storm with its ChatGPT, but few people know that OpenAI first adopted an open source model in an attempt to liberate people from big tech monopolies. However, Cade Metz, the author of Genius Maker, made a harsh comment by saying “It (OpenAI) was an idealistic vision that would eventually prove to be completely impractical.” Yann LeCun, the inventor of CNN, even predicted that this model was doomed to fail at the beginning. Within a few years, OpenAI became a for-profit, closed-source company. 

P.S. On June 12 it was announced that Google DeepMind, OpenAI and Anthropic agreed to open up their AI models to the U.K. government for research purposes only. It is not completely open source.

Posted on June 16, 2023

Yesterday (6/15) an article published by the IMD discussed how the role of data scientists is changing. In 2012, Harvard Business Review identified data scientist as the sexiest job of the 21st century. However, today user-friendly software is simplifying complex tasks that previously necessitated data scientists. Until recently, data wrangling, such as cleaning, restructuring, re-formatting, and pre-processing required the expertise of data scientists; but in the near future, AI-enabled software tools will handle this type of menial data preparation. According to Gartner, a prominent consulting company, by 2025 70% of new applications developed by organizations will be low- or no-code solutions, up from under 25% in 2020. As a result, the role of data scientists will evolve from that of an astronaut (who uses state-of-the-art technology for exploring uncharted territory) to that of a champion race-car driver (who uses standardized technologies for navigating in real life). Three recommendations were made by these authors to cope with the preceding trends: 1. Reskill the existing employees 2. Hire data scientists for specialized purposes, such as sophisticated applications, scalability, and innovation. 3. Invest in the analytics infrastructure that can produce usable data.  

Full article (subscription required):

https://www.imd.org/ibyimd/innovation/the-changing-role-of-the-data-scientist-from-astronaut-to-race-car-driver/

That’s my take on it:

I completely agree with all three recommendations. That's exactly what I've been proposing. I've been holding a minority opinion about DSML for a long time. At this moment, most DSML training programs place too much emphasis on Python and R programming, probably because they mix programming with analytics. Over the years, I went through the transition from TSO on the IBM Mainframe and DOS on the PC to the GUI on Mac. The lesson is: If I can run an interactive data visualization on Tableau, JPM, or SAS Viya using drag and drag in three minutes, I don’t see a reason to spend an hour to build the same thing by coding Dash in Python. Just like the transition from DOS to GUI, using a low-code and no-code solution is a natural and irreversible trend. Moreover, while data scientists earn big salaries doing data wrangling, it isn't cost-effective at all. A good data plan can prevent 90% of data issues. I am surprised to see that today some systems still truncate the year variable to three digits (from “2001” to “201”), or code “yes/no” into “1/2”! And some organizations still collect many unused or non-usable data without pre-conceptualized research questions! 

 

Posted on June 15, 2023

Currently OpenAI's CEO Sam Altman is traveling around the world to learn what people want from AI, what they are doing with it, and how we can regulate this emerging technology. At the present time, China and the U.S. are taking very different approaches to regulating AI. The top-down regulatory strategy of China is characterized by state control and an emphasis on national security, whereas the bottom-up approach of the United States has the government taking a back seat.  There is no federal regulation on AI in the US. To fill the vacuum, Microsoft and Google established internal AI governance teams and published their own AI principles.

Full article (subscription required): https://asia.nikkei.com/Spotlight/The-Big-Story/ChatGPT-unleashed-an-AI-race-now-regulators-are-struggling-to-hold-on?utm_campaign=GL_editor_in_chief_picks&utm_medium=email&utm_source=NA_newsletter&utm_content=article_link&del_type=2&pub_date=20230616150007&seq_num=16&si=14108111

That’s my take on it: Paul Kedrosky, managing partner at SK Ventures said, "The top-down approach China uses actually has huge merits, in my opinion. When the bottom-up approach is too slow, it seems irresponsible and even immature and childish to wait for things to happen." I tend to disagree. A test of China's chatbots found that some information is not accessible because Chinese regulatory requirements ensure its AI-enabled chatbot won't make mistakes on "important and sensitive topics.” However, regulations that undermine freedom of thought ultimately undermines innovation.

Posted on June 13, 2023

Google recently announced several new DSML products, some of which are still in the experimental stage. One of these innovative products is StyleDrop, an AI-enabled art tool that allows users to

generate images in a consistent style. In StyleDrop, the user can easily transfer an original image with a desired style to a new image while preserving its unique characteristics. Furthermore, Google

announced last week that it has partnered with Salesforce, the parent company of Tableau, to integrate data analytics into its cloud platform. Specifically, Google and Salesforce plan to integrate

Data Cloud and BigQuery to enable businesses to create unified customer profiles in a more efficient way.

YouTube video: https://www.youtube.com/watch?v=q_ebiphq2Pk&t=735s

Article: https://techcrunch.com/2023/06/07/google-cloud-and-salesforce-team-up-to-bolster-ai-offerings/

That’s my take on it:

Even though Midjourney and Stable Diffusion are good at generating art, neither produces a consistent style. Google is so smart that it doesn't follow a "me-too" strategy. Adding features similar

to those offered by Midjourney and Stable Diffusion is unlikely to entice customers away from those established generative art platforms. But customers will give Google a try for something new.

By the same token, it will be very difficult for Google Cloud to compete with Amazon Web Services in terms of cloud computing capabilities. Rather, it will be more beneficial for Google to

leverage data visualization through the strategic partnership with Salesforce.

Posted on June 9, 2023

A while ago Microsoft announced several new features in Windows 11, but at that time most articles focus on Windows Copilot only, which is an AI version of Clippy. Specifically, Window Copilot is a

digital personal assistant that can help the user complete tasks easily through natural language inputs. The following YouTube video provides a more comprehensive overview of Windows 11 enhancements. Besides Windows Copilot, the video also mentions AI tools for developers using Azure (the Microsoft cloud computing platform) and ONNX Runtime, and also AI enhancements to the Microsoft Store.

In addition, Microsoft also increased support for ARM-based systems.

Video: https://www.youtube.com/watch?v=3iqBWwkZfNM

Since its release Windows Copilot has been the subject of many articles, and I want to shift the focus to the Windows ARM-based systems. Although ARM, invented in 1985, is not directly related to AI,

it is still crucial to high performance computing and big data analytics. Chris Miller argued in his book "Chip War" that Intel's dominance of the CPU market for so long is due to luck. There is no doubt that

the x86 architecture is not the best; rather, it is too complex and resource-intensive. On the contrary, due to using reduced instruction set computing (RISC), ARM processors are simpler in design, much

more compact, and can run faster. And thus ARM-based systems are popular in smartphones and other small devices. At first, Microsoft intended to introduce ARM support in Windows 10, but in the end,

it was pushed into Windows 11. In comparison with a traditional Windows laptop, Windows on Arm has superior battery life, always-online internet connectivity using 4G or 5G, super-fast boot times, and chipset-level security support. It takes a long time for a paradigm shift to occur!

Posted on June 2, 2023

Two days ago, Amazon announced it would pay more than $30 million in fines to settle allegations that its Alexa voice assistant and Ring doorbell camera violated privacy laws. A lawsuit filed by the Federal Trade Commission (FTC) alleges that Amazon kept records of children's conversations with Alexa in violation of privacy laws, while another alleges that its employees viewed recordings from Ring cameras without consent. Amazon would also be prohibited from using the predictive models built upon these data. Despite the FTC's rulings, Amazon argued that it had not broken any laws.  

Full article: https://www.npr.org/2023/06/01/1179381126/amazon-alexa-ring-settlement

That’s my take on it: Big data analytics and machine learning have made Amazon, Google, and Facebook researchers better psychologists and sociologists than academicians, since the former group can access oceans of behavioral data collected in naturalistic settings. When we are unaware of their data collection, these data tend to reveal our true character and behaviors. Needless to say, invasion of privacy is a concern. However, before we point our fingers to Amazon, Google, and Facebook, we should not forget that many well-known psychological studies in the past, such as Milgram's and Zimbardo's studies, were conducted in the absence of IRB approval or are considered unethical today. It will take some time to

fine-tune the ethical standards of behavioral data.  

Posted on June 2, 2023

In a recent article (May 31, 2023) published in Towards Data Science, data scientist Col Jung argued that organizations should migrate away from traditional data lakes and adopt a data mesh approach. In Jung's view, organizations using old-fashioned data warehouses are trapped in a mess of data systems connected by innumerable data pipelines. Data lake was introduced as a solution by centralizing diverse data into a hub, but “data lake monsters” are “over-promised and under-realized.” In the era of big data, all analytical questions rest on the shoulders of the data lake team. Consequently, the central data team encountered tremendous scalability problems and became inefficient. To rectify the situation, in 2019 Dehghani proposed data mesh as the next-generation data architecture embracing a decentralized approach. Instead of transferring data to a centralized lake, a data mesh allows domain-specific teams to control and deliver data as products, promoting easy interoperability and accessibility across the organization.

Full article: https://towardsdatascience.com/from-data-lakes-to-data-mesh-a-guide-to-the-latest-enterprise-data-architecture-d7a266a3bc33 

That’s my take on it: In my experience many requests to the central data office are simple questions, but as Col said, the data team is overwhelmed under the traditional centralized data architecture. The good news is: Data meshes facilitate self-service data usage, whereas data lakes do not. Is a decentralized system likely to result in chaos, with different people processing data differently? I don’t think so. Since data meshes are owned by different entities, they require stricter formatting, metadata fields, discoverability, and governance standards.

Posted on May 31, 2023

A group of AI scientists, executives, and academicians released a statement yesterday (5/30) regarding the dangers of AI: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” The signatories include Geoffrey Hinton, the cognitive scientist who modified the backpropagation algorithm for neural networks, Yoshua Bengio, the computer scientist who co-developed the idea of “generative adversarial networks." Sam Altman, the CEO of Open AI, Demis Hassabis, the CEO of Google DeepMind, and many other leading authorities in the field.

Open statement: https://www.safe.ai/statement-on-ai-risk

That’s my take on it: This type of open letter or statement warning about the dangers of AI has been recurring for a while. Probably these experts really know something that we don't know. Based on history, it is very likely that AI will eventually be militarized, and to maximize its power, humans might hand over decision-making to AI. An article by Bill Drexel and Hannah Kelley published in Foreign Affairs suggested that an AI disaster would occur in an authoritarian state without checks and balances, resulting in systemic errors that worsen a mistake or accident. In my humble opinion, systemic mishandlings during a disaster could also happen in democratic nations. Remember how bad COVID19 spread across Europe and the US?

Posted on May 27, 2023

Since high-performance computing for AI is in high demand, NVIDIA, the market leader in GPUs, has had its stock surge over 25%, outperforming AMD, TSMC, and Intel. NVIDIA
became the 
fourth most valuable company, just behind Apple, Microsoft, Alphabet, and Amazon. According to Louis Navellier, chairman and founder of Navellier & Associates,
impressed by the rise of NVIDIA, 
"skeptics of the AI bubble have been silenced for the moment.” On the other hand, Yahoo Finance argued that the Nvidia stock surge could signal
the start of the AI bubble. Citing the history of 
the Japanese bubble in the late 1980s and early 1990s, the .com bubble in the US in 2000, and the recent Bitcoin bubble, the Yahoo
analyst gave this warning, “if history is any guide, guess what 
AI could be a larger one.”

Articles: https://www.kiplinger.com/investing/stocks/stock-market-today-stocks-rise-after-ai-outlook-sparks-explosive-nvidia-rally

https://finance.yahoo.com/video/nvidia-stock-surge-could-signal-193731745.html

That’s my take on it: The law of regression towards the mean tells us that everything goes up eventually goes down, and everything has a tendency to revert to mediocrity. It is not a question of if, but when, the bubble will burst. However, I believe AI is still very much in its infancy and has a lot of room for development. NCSA Mosaic, the first Web browser, was
released in 1993, and the .com bubble took almost a decade to burst. Despite the burst, the Internet is still available, and investments in the Internet infrastructure (such as fiber optic
cables installed during the mid-1990s) continue to benefit the entire world. By the same token, I think it is premature to talk about the AI bubble and AI will be here to stay. 
 

 

Posted on May 26, 2023

Recently Adobe released generative fill, a feature only available for beta testing, in response to the threat of DALL.E2, Midjourney, and Stable Diffusion. The Adobe generative tool,
like DALL.E2, Midjourney, and Stable Diffusion, can generate images using natural language processing. But unlike its counterparts, Adobe allows you to select a specific area of the
canvas where you want to add, extend, or remove content. For instance, when you think a picture of a seashore is too plain, you can describe an object or scene, such as "add a
lighthouse," and in a few seconds, a realistic lighthouse appears.

Demo page (Last updated on May 25, 2023): https://helpx.adobe.com/photoshop/using/generative-fill.html

As expected, everything appears smooth in the demo. Frankly speaking, I wasn't impressed with the beta version. As an example, I added a person to an empty hallway photo. The
prompt I provided was: "A girl is walking and looking out the window." Adobe generated several variants based on my input, but all of the faces are distorted. Furthermore, there is no
reflection of the person on the glass (see attached). A paradigm shift is undoubtedly occurring with the rise of AI generative art, but Adobe is late to the game. Only the fittest of the
fittest shall survive!

 

Posted on May 26, 2023

Recently (May 21) Analytics Insight posted a report highlighting the top 10 highest-paying countries in need of data scientists. The order is as follows: USA, Switzerland, UK, Australia, Israel, India, Canada, China, Italy, and France. Take the US as an example. The median salary at the entry-level starts at US$95,000. For experienced data scientists the median pay could be as high as US$165,000.

Article: https://www.analyticsinsight.net/top-10-highest-paying-countries-in-need-of-data-scientists-in-2023/?utm_source=ONTRAPORT-email-broadcast&utm_medium=ONTRAPORT-email-broadcast&utm_term=Newsletter&utm_content=Data+Science+Insider%3A+May+26th%2C+2023&utm_campaign=27052023

That’s my take on it: It surprises me that some countries aren't included in the list. The high-tech sectors of Germany, Japan, South Korea, and Taiwan, for instance, are vibrant and fast-growing, so data scientists should be in high demand. I might be missing something or the survey data are incomplete. Out of curiosity, I looked up employment information for data scientists overseas. According to Glassdoor, the average salary of a data scientist in Tokyo is US$55,831 (Yen 7851192), while that in Germany is US$72,419 (Euro 67500). The figures were not adjusted for purchasing power. Nonetheless, I will stay in America! 

 

Posted on May 25, 2023

About a week ago (May 16) Forrester, one of the most trustworthy tech consulting companies in the world, published a report about the current trend of the AI-based decision-support market. The Forrester Wave evaluation report classified companies into four groups: Leaders, Strong Performers, Contenders, and Challengers. SAS and IBM (IBM Cloud Pak) belong to the first category (leader) while TIBCO (Spotfire) is placed in the second group (strong performer). According to Forrester, “SAS seamlessly integrates world-class analytics for decisioning. SAS’s flagship Viya platform includes beautifully designed interfaces across the entire data-to-decision lifecycle. Any combination of analytics, machine learning, and optimization can easily be created and used by teams within SAS Intelligent Decisioning.” “IBM business automation is driven by AI decisions. IBM’s AI decisioning platform is comprised of IBM Automation Decision Services (ADS) available in IBM Cloud Pak for Business Automation and IBM Watson Studio, and IBM OpenScale available in IBM Cloud Pak for Data.”

Report: https://reprints2.forrester.com/#/assets/2/202/RES178488/report

That’s my take on it: Contrary to popular belief, in spite of constant challenges from open source, proprietary software packages are still alive and well-functioning. Many people compare between SAS, Python, and R, but it is like comparing apples and oranges. Python and R are programming languages whereas SAS and IBM are integrated systems and platforms. When we need to implement DSML in a complicated environment with big data, we need a comprehensive system with user-friendly interface, rather than a DOS-like, command-based environment. Just my 2-cent.

Posted on May 18, 2023

Today (5/18) UC Berkeley announces that it will open a College of Computing, Data Science and Society, which is expected to be approved by The University of California Board of Regents. During the 2025-26 academic year, a new college building will house the data science major that was launched five years ago, along with other computer science degree programs. More than 89 campuses have access to the online curriculum, which includes assignments, slides, and readings. UC Berkeley also has disseminated its curriculum to other colleges and universities for free. Beginning this fall, there will be UC Berkeley-led data science classes at six California community colleges, four Cal State campuses, and Howard, Tuskegee, Cornell, Barnard, and the United States Naval Academy.

Full article: https://www.latimes.com/california/story/2023-05-18/uc-berkeley-spreads-the-gospel-of-data-science-with-new-college-free-curriculum

It is important to point out that this data science conglomerate is not a result of a merely top-down decision; rather, it happens due to huge faculty and student demand. Data science has risen to the fourth most popular major at UC Berkeley in just five years. Faculty and students at UC Berkeley are aware of the importance of data science. In June 2023, UC Berkeley received three gifts totaling $75 million for supporting the construction of the data science center. Two of the gifts are from the current Berkeley faculty. Needless to say, good leaders must pay attention to bottom-up movements; they must be active listeners who can constantly learn and adapt to change.

Posted on May 9, 2023

The IBM CEO Arvind Krishna announced a hiring freeze last week (May 7). Also, nearly 8,000 jobs will be replaced by AI at the company, he said. Throughout the next five years,
machines may take over up to 30% of non-customer facing roles. In the near future, robots and algorithms will likely pose a significant threat to workers in fields like finance, 
accounting, and HR. The upside of this transformation is that AI is expected to contribute $16 trillion to the global economy by 2030.

Report: https://finance.yahoo.com/news/ibm-plans-replace-nearly-8-174052360.html?guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAGHMQWxx33K2TXzdlFFvmZILHZVQQpRtRRCwC7I5_9bc2dwUiB6GS4g7e58lzfMKLgWXSYfPH9_CR68-fRvUD2YHfQHCr58POZIcVZ_smwVJiKcHCIdOwOsBxj4uruQrI11LHfmCmHVdHp9v5waiR5GKt44YblsgqykakA-W90VP&utm_medium=email&_hsmi=257447386&_hsenc=p2ANqtz--w4gVNs1s-ieC1IOjHA9Gt1w5hCvLtkydqmuQE7PccVF955PvehkElxIWb0CDUeIRgPZ17NnZhHdF46Kh3euc8oAtM0g&utm_content=257447386&utm_source=hs_email&guccounter=1

That's my take on it: The writing has been on the wall for a long time. Academicians are not immune to this trend. In the past, it was necessary to have expertise to perform data
transformations when a nonlinear function could fit the data better. With neural networks, the transformation can be automated in a matter of seconds. In order to deal with the
trend, higher education must reform its curricula; otherwise, graduates with outdated skills will find themselves unable to find jobs.   

Posted on May 5, 2023

White House officials announced yesterday (May 4) that more funding and policy guidance will be provided for developing responsible artificial intelligence before the Biden
administration meets with industry executives. The National Science Foundation plans to invest $140 million in seven new AI-dedicated institutes, bringing the total to 25. With
the goal of making the United States a leader in AI innovation while ensuring that it is developed and used responsibly, the newly created National Artificial Intelligence Initiative
Office will work with academic institutions, government agencies, and industry leaders to address issues such as bias, privacy, and transparency.

 

Website: https://www.theverge.com/2023/5/4/23710533/google-microsoft-openai-white-house-ethical-ai-artificial-intelligence

 

That’s my take on it: AI/DSML spans across almost all disciplines, rather than being confined to science and engineering. These diverse institutes devoted to AI will cover a wide
range of topics, including ethical issues, AI impact on politics, society, economics, and more. Thus, philosophers, psychologists, sociologists, and economists alongside researchers
in other disciplines will have funding opportunities. Now is the time to act!

 

Posted on April 30, 2023

A few days ago (April 26), in a statement published by the Association for Mathematical Consciousness Science (AMCS), a group of more than 150 researchers specializing in
mathematical and computational methods for understanding consciousness warns that AI is advancing at a pace that is speeding beyond our understanding of its ethical, legal,
and political implications. Language models such as Google's Bard and OpenAI's ChatGPT now mimic animal brain neural networks, but will soon be constructed to replicate
higher-level brain architectures, and thus it is essential for AI researchers to study the nature of consciousness. According to the letter, “there are over 30 models and theories
of consciousness (MoCs and ToCs) in the peer-reviewed scientific literature, which already include some important pieces of the solution to the challenge of consciousness.”

Open letter: https://amcs-community.org/open-letters/

That’s my take on it: Cognitive science and philosophy of mind have long been intrigued by the concept of consciousness. Traditionally, this type of research has been considered
purely theoretical and "academic." Today, this type of research has a wide range of practical implications. Reductive materialism asserts that conscious phenomena are made up
solely of neurological structures. In this case, consciousness could emerge from material (machines). About 10 years ago, former Arizona State University faculty member Lawrence
Strauss predicted that in the near future there will be self-aware computers. Let's see how it goes.

Posted on April 22, 2023

Recently a German photographer named Boris Eldagsen refused the Sony world photography awards after admitting to being a “cheeky monkey” by generating the
award-winning image using AI. Eldagsen used a pseudonym to submit the AI-generated photo, and the judges selected it as the winner. In an open statement, Eldagsen
wrote, “We, the photo world, need an open discussion. A discussion about what we want to consider photography and what not. Is the umbrella of photography large
enough to invite AI images to enter – or would this be a mistake?... AI images and photography should not compete with each other in an award like this. They are different
entities. AI is not photography. Therefore I will not accept the award.”

That’s my take on it: Does AI-enabled imaging qualify as photography? It depends. This type of debate is not entirely new. When digital photography was introduced,
some traditional photographers disliked images manipulated by computer software, such as Adobe Photoshop. They argued that those images are no longer authentic
and natural. In the past, photographers used a variety of filters and darkroom techniques to enhance their images. For me, a tool is a tool, no matter whether the tool is
physical, digital, or AI-enabled. The image, however, should not be considered photography if it was entirely created by AI without input from the photographer.

Posted on April 11, 2023

In response to the arrival of ChatGPT, recently a group of prominent AI researchers signed an open letter to call for slowing down AI developments that can pass
the Turing Test. The Turing test measures a machine's ability to exhibit intelligent behavior that is indistinguishable from human behavior. Yoshua Bengio is one of
the leading experts in deep learning who co-signed the letter.

Bengio wrote, “I found it appropriate to sign this letter to alert the public to the need to reduce the acceleration of AI systems development currently taking place at
the expense of the precautionary principle and ethics. There is no guarantee that someone in the foreseeable future won’t develop dangerous autonomous AI systems
with behaviors that deviate from human goals and values. The short and medium-term risks –manipulation of public opinion for political purposes, especially through
disinformation– are easy to predict, unlike the longer-term risks –AI systems that are harmful despite the programmers’ objectives, and I think it is important to study both.”

Full article: https://yoshuabengio.org/2023/04/05/slowing-down-development-of-ai-systems-passing-the-turing-test/

That’s my take on it: Bengio cited the precautionary principle to argue for slowing down AI development. According to the precautionary principle, if an action could potentially
cause harm to the public or to the ecology, without scientific consensus, the burden of proof that it is not harmful is on the shoulder of the party taking the action. Because
most AI developers are not philosophers of ethics or legal experts, it places a heavy burden on them. I think there is no need to slow down AI development; instead, experts
from different disciplines should be part of every development team, and there should be opportunities to engage in open debates and discussions regarding AI ethics.

Posted on March 31, 2023

Recently Researchers at IBM Research Zürich and ETH Zürich developed the Neuro-Vector-Symbolic Architecture (NVSA) that combines connectionist deep
learning and vector-based symbolic computations. Rather than learning from past data alone, the system is given prior background information. While deep neural
networks can perform well on perception tasks, symbolic computing is good at processing high-dimensional vectors. According to the research team, this
integrated approach outperformed both state-of-the-art deep neural networks and neuro-symbolic AI approaches, achieving new record accuracies of 87.7% and
88.1% on the RAVEN and I-RAVEN datasets, respectively.

Full article: https://www.nature.com/articles/s42256-023-00630-8

That’s my take on it: The idea of perceptron was proposed by Frank Rosenblatt in 1958. However, it was criticized by Marvin Minsky and Seymour Papert
in 1969. Since then, the symbolist approach to AI became the dominant paradigm. Only after backpropagation was developed in the 1980s did neural networks
become a major direction of machine learning research. Symbolism and connectionism have been rivals for a while, but in recent years researchers have
attempted to integrate them. For example, two years ago researchers at MIT proposed the Neuro-Symbolic Concept Learner (NS-CL) that incorporates both symbolism
and connectionism. NVSA is another research endeavor along this line. I believe that there will be more attempts in the future to synthesize symbolism and connectionism
in the AI research community.

Posted on March 28, 2023

A new version of Microsoft Bing-powered by ChatGPT is now available for beta testing. With the revamped search engine, queries written in natural language will

be recognized and answered by Open AI technology. If you have an account in ChatGPT, do you need Bing? I tested both AI systems by entering the same questions.

The answers to simple questions about factual information were very similar between the two, but when the questions were complicated, such as comparing two schools

of thought, the answers were very different.

Sample question: What is the key difference between Gary Marcus and Yann LeCun in terms of their perspectives on AI?

The answer from Bing is short. It provides footnotes but the sources are non-academic.

Gary Marcus and Yann LeCun are both prominent figures in the field of artificial intelligence. However, they have historically sharply different perspectives and opinions

on the development and future of AI, specifically deep learning. LeCun is famous for his pioneering work in the field of deep learning and computer vision(1). He believes

 that deep learning is the key to unlocking true AI(2). On the other hand, Gary Marcus has been a vocal critic of deep learning and has argued that it is not sufficient for

achieving true AI(1). He believes that AI needs to incorporate more symbolic reasoning and common sense knowledge(3).

1. rebellionsearch.com

2. zdnet.com

3. zdnet.com

The answer from ChatGPT is more elaborated, but no references are provided.

 

Gary Marcus and Yann LeCun have different perspectives on the field of artificial intelligence, specifically regarding the role of deep learning and the importance of incorporating

different forms of knowledge in AI systems.

Gary Marcus is a cognitive psychologist and professor of psychology at New York University, and he has been critical of the current state of deep learning and the limitations

of the approach. He argues that while deep learning has been successful in certain areas, such as image and speech recognition, it is not capable of achieving general intelligence

and that current approaches are too narrowly focused. He also believes that deep learning is insufficient on its own and that AI systems need to incorporate other forms of knowledge,

such as rules and logical reasoning, to achieve true intelligence.

Yann LeCun, on the other hand, is a computer scientist and professor at New York University who has been instrumental in the development of deep learning and convolutional neural

networks. He believes that deep learning is a crucial component of achieving general intelligence and that it is capable of handling a wide range of tasks, including those that require

reasoning and abstraction. He argues that the key to improving AI is to continue developing deep learning algorithms and architectures and to scale up the amount of data used to train

these systems.

Posted on March 23, 2023

The NVIDIA conference today featured a panel discussion entitled "AI Education and Access at Scale." The panel consists of four speakers and all are inspirational.

Because of space constraints, I can only highlight one of them: Joe Glover, Provost and Senior VP for Academic Affairs at the University of Florida (UF). A few years

ago UF launched a university-wide AI initiative that aims to prepare students and faculty to use AI technology to solve pressing challenges across every field.

To achieve this goal, UF formed a partnership with NVIDIA to build an AI supercomputing infrastructure. The UF program covers all disciplines at all levels, including

arts and humanities. According to Glover, some people were skeptical at first because this idea is so out of the ordinary. Nonetheless, he argued that AI is an

encompassing technology that can be well-applied to all disciplines. For instance, UF musicologist Imani Mosley utilized AI to discover the patterns of Spotify

whereas UF geology professor Mickie Mackie conducted research on improving sea level rise predictions with the help of machine learning. In order to facilitate the

integration of AI and various disciplines, UF hired a team of programmers to assist faculty in the technical aspects. The recording of the session can be accessed at:   

https://register.nvidia.com/flow/nvidia/gtcspring2023/attendeeportal/page/sessioncatalog/session/1665610946330001j59w

That’s my take on it: I totally agree with Dr. Joe Glover. Indeed, this is exactly what I have been trying to do for a long time. There are countless possibilities for

implementing AI in almost all disciplines. For example, instead of manually coding the corpus, literature, and history researchers can use text mining to analyze

archives. By utilizing AI generative tools, artists can spend more time conceptualizing instead of repeating tedious tasks. Needless to say, social scientists can

overcome the shortcomings of traditional statistics and solve the replication crisis by equipping themselves with machine learning tools and big data analytics. I will

look into the UF model and I hope NVIDIA has more supercomputers to offer!  

Posted on March 22, 2023

AI-enabled generative art tools, such as Midjourney and Stable Diffusion, have been taking the world by storm. In response to the market trend, recently both Adobe and

Microsoft announced the beta version of their own AI-based text-to-image tools, respectively. The generative AI tool of Adobe is known as Firefly, which will be integrated

into Creative Cloud Document Cloud, Experience Cloud, and Adobe Express workflows. The Microsoft product, which is based on Open AI’s DALLE, will be integrated into 

Bing. Currently, both are accepting beta testers. You can sign up for them at

https://www.adobe.com/sensei/generative-ai/firefly.html

https://www.bing.com/create

That’s my take on it: I am still waiting for approval from Adobe. Nonetheless, I have access to the Microsoft AI tool. Frankly speaking, it is disappointing. I applied the same

or similar command prompts that I used in Midjourney to the Microsoft tool, such as “a long-haired girl wearing a long white dress is holding a horse on a beach during

sunset time” and “a dancer in waterfalls.” As you can see in the attachments, either the portraits are disfigured or too dark. These are just a few examples. Nothing that I tried

to make with the tool so far is acceptable.

This morning I attended the talk “Are Generative Models the Key to Achieving Artificial General Intelligence?” at the NVIDIA conference. The presenter explained how the

diffusion model used by Midjourney and Stable Diffusion works. By using zero-shot problem solving, an AI generative model that is trained with low-resolution images can

predict high-resolution ones. I think that’s why Midjourney and Stable Diffusion are superior. There is still a long way to go for Microsoft. It is my hope that Adobe will do a better job. 

           

Posted on March 21, 2023

This week NIVIDA is hosting its annual online conference to unveil its new offerings, such as a cloud-based supercomputing service. This morning (3/21) NVIDIA founder and CEO,

Jensen Huang shared how NVIDIA's accelerated computing platform is driving the next wave in AI, the metaverse, cloud technologies, and sustainable computing. In addition to the

keynote, there are many informative sessions, such as Using AI to accelerate scientific discovery, Generative AI demystified, Deep reinforcement learning with real-world data,

Accelerating exploratory data analysis at LinkedIn…etc. At the keynote, Jensen Huang announced NVIDIA AI foundations, which will be deployed to Google Cloud, Microsoft Azure,

and Oracle Cloud. In the presentation, Huang kept repeating this phrase: “We are at the iPhone moment of AI.” All sessions are recorded and can be accessed at:

https://register.nvidia.com/flow/nvidia/gtcspring2023/attendeeportal/

Posted on March 17, 2023

In response to the challenge from ChatGPT, two days ago (March 15) China’s AI developer Baidu released "Wen Xin Yi Yan" at its Beijing headquarters. Its text generation mode is

similar to that of ChatGPT, but additionally, it can read out the answer in real-time, corresponding to various Chinese dialects, including Cantonese and Sichuan dialects. Moreover,

the content can be generated into pictures and videos in real-time, too. Robin Li, Chairman and CEO of Baidu, demonstrated the comprehensive capabilities of "Wen Xin Yi Yan" 

in five usage scenarios: literary creation, commercial copywriting, mathematical calculation, Chinese comprehension, and multi-modal generation. He admitted that in the internal

test, the experience of "Wen Xin Yi Yan" is not perfect, but seeing the strong demand in the market, he will release the product as soon as possible. At present, "Wen Xin Yi Yan"

has a better ability to support Chinese, and the English ability will be further improved in the future. Since the official announcement last month that "Wen Xin Yi Yan" will be

released, 650 partners have joined in, and more related products will appear in the short term. He emphasized that "Wenxin Yiyan" is not a tool for the technological confrontation

between China and the United States, but a brand-new platform for the group to serve hundreds of millions of users and empower thousands of industries. Starting today, the first

batch of users can experience the product on the official website of "Wen Xin Yi Yan" by inviting a test code, and it will be opened to more users in succession.

There are more than 260 billion parameters in Baidu's chatbot model, which is more than in GPT-3, but some critics believe its performance is not as good as ChatGPT, partly due

to its lack of web-based Chinese information.

Full text: https://www.scmp.com/tech/big-tech/article/3213605/chinese-online-search-giant-baidu-launch-its-answer-chatgpt-shadow-openais-upgraded-gpt-4-model

That’s my take on it: Perhaps the biggest hurdle to China's chatbot development is not the technological issue; rather, there are too many red lines. Once a tester inputs a sensitive

question into China's chatbot, but the system refused to answer: "The question could not pass a safety review. No response could be generated for you.” When the reporter tried 

to push it by asking, “Why did my question fail to pass the safety review?” The answer was: “Let’s change the topic and talk about something else.” In contrast, ChatGPT handles

sensitive or controversial questions differently: although the answer is usually vague and balanced, at least it gives the user objective facts and lets them decide. 

Posted on March 3, 2023

According to recent research conducted by two cognitive psychologists at the Max Planck Institute for Biological Cybernetics in Tübingen, GPT-3 is comparable to humans in some areas

but lags behind in others. One of the questions presented by the researchers to GPT-3 is the classical Linda’s problem (I use it in my statistics and probability class):

Linda is 31 years old. She majored in philosophy. She was deeply concerned with issues of social justice and discrimination. Which of the following statements is more probable?

A: Linda is a bank teller.

B: Linda is a bank teller and active in the feminist movement.

The correct answer is A because B is a subset of A. The probability of two events or conditions is definitely smaller than that of a single event. But most respondents picked B, which is a conjunction

fallacy. Interestingly, researchers at Max Planck Institute found that GPT-3 committed the same mistake as humans. Those researchers concluded that in searching for specific information or causal

reasoning, AI failed to use logic; rather, it only passively gets information from texts available on the Internet. When the majority is wrong, AI replicates the error.

Full paper: https://arxiv.org/pdf/2206.14576.pdf

That’s my take on it: Out of curiosity, I also entered the same question into ChatGPT. Unlike the result obtained by the two researchers at the Max Planck Institute, I received a more “typical” response.

If a student submits a vague answer like this, it might be sourced from AI! Based on the information provided, statement A or B cannot be determined as more probable. This is because Linda's age

and major in philosophy do not provide any indication of her profession or involvement in social movements. While her concern with social justice and discrimination suggests that statement B could

be possible, there is no clear evidence to support either statement. Therefore, it would be more accurate to say that both statements A and B are possible, but there is not enough information to determine

which one is more probable.

Posted on March 3, 2023

Recently Harvard Business Review (HBR) reported that many retail companies have not taken advantage of advanced data analytics to improve their business. There are exceptions: Walmart,

Amazon, and a few others. The 25 best-performing retailers during the pandemic generated 83% more profit than laggards and captured more than 90% of the sector's market capitalization gains.

By interviewing 24 business leaders, HRB unveiled six sticking points as follows: