天天看點

讓老闆成為資料分析師--ChatGpt連結本地資料源實戰測試

作者:技術聯盟總壇

慈修 阿裡開發者 2023-06-28 09:01 發表于北京

讓老闆成為資料分析師--ChatGpt連結本地資料源實戰測試

阿裡妹導讀

本文探究ChatGpt等AI機器人能否幫助老闆快速的做資料分析?用自然語言同老闆進行溝通,滿足老闆的所有資料分析的訴求?

一、背景

設想這樣一個場景:你是某貿易公司的老闆,公司所有的日常運轉資料都在私域的進銷存系統,包括客戶、供應商、銷售、庫存、進貨、商品等,你每天需要關注公司營運情況,并且希望商業分析師對目前營運資料給出分析和建議,對公司的營運政策進行及時調整。ChatGpt等AI機器人能否幫助老闆快速的做資料分析?用自然語言同老闆進行溝通,滿足老闆的所有資料分析的訴求?

二、工作原理

衆所周知,ChatGpt的資料截止隻至2021年9月且無法直接聯同網際網路和區域網路(雖然plus賬戶可以使用插件來擴充chatgpt能力比如聯網,讀取本地文檔等,但輸入和輸出tokens有限制且月20刀的費用讓人望而卻步),那我們該如何把chatgpt能力同本地能力結合呢?

為了滿足上述的需求,利用LangChain(語言模型驅動的應用程式的開發架構,具體可問ChatGpt)的資料感覺能力(将語言模型同其他資料源連結)和代理能力(使語言模型能夠與其環境進行互動),讓老闆通過自然語言同引擎進行互動,用老闆思維提問,傳回給老闆各種次元的資料甚至對應的可視化表達。(下圖是網上拿來的LangChain連結本地檔案和語言模型的示意圖,大緻可以說明本次實驗的原理)。

讓老闆成為資料分析師--ChatGpt連結本地資料源實戰測試

三、構造資料模型和測試資料

利用ChatGpt給出進銷存系統的資料模型和測試資料,模拟某貿易公司的日常營運資料。

CREATE TABLE Product (

    ProductID INT PRIMARY KEY,

    ProductName VARCHAR(255),

    ProductDescription TEXT,

    CostPrice DECIMAL(10,2),

    SalePrice DECIMAL(10,2)

);




CREATE TABLE Supplier (

    SupplierID INT PRIMARY KEY,

    SupplierName VARCHAR(255),

    ContactPerson VARCHAR(255),

    ContactPhone VARCHAR(20),

    Address TEXT

);




CREATE TABLE Customer (

    CustomerID INT PRIMARY KEY,

    CustomerName VARCHAR(255),

    ContactPerson VARCHAR(255),

    ContactPhone VARCHAR(20),

    Address TEXT

);




CREATE TABLE Inventory (

    InventoryID INT PRIMARY KEY,

    ProductID INT,

    Quantity INT,

    FOREIGN KEY (ProductID) REFERENCES Product(ProductID)

);




CREATE TABLE PurchaseOrder (

    PurchaseOrderID INT PRIMARY KEY,

    SupplierID INT,

    OrderDate DATE,

    FOREIGN KEY (SupplierID) REFERENCES Supplier(SupplierID)

);




CREATE TABLE SalesOrder (

    SalesOrderID INT PRIMARY KEY,

    CustomerID INT,

    OrderDate DATE,

    FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)

);




CREATE TABLE PurchaseOrderDetail (

    PurchaseOrderDetailID INT PRIMARY KEY,

    PurchaseOrderID INT,

    ProductID INT,

    Quantity INT,

    Price DECIMAL(10,2),

    FOREIGN KEY (PurchaseOrderID) REFERENCES PurchaseOrder(PurchaseOrderID),

    FOREIGN KEY (ProductID) REFERENCES Product(ProductID)

);




CREATE TABLE SalesOrderDetail (

    SalesOrderDetailID INT PRIMARY KEY,

    SalesOrderID INT,

    ProductID INT,

    Quantity INT,

    Price DECIMAL(10,2),

    FOREIGN KEY (SalesOrderID) REFERENCES SalesOrder(SalesOrderID),

    FOREIGN KEY (ProductID) REFERENCES Product(ProductID)

);           

利用ChatGpt自動構造測試資料,檢查了主鍵外鍵均能一一對應,給ChatGpt點贊。

-- Insert into Product
INSERT INTO Product (ProductID, ProductName, ProductDescription, CostPrice, SalePrice) VALUES
(1, 'Product1', 'Description1', 10.00, 20.00),
(2, 'Product2', 'Description2', 15.00, 25.00),
(3, 'Product3', 'Description3', 20.00, 30.00),
(4, 'Product4', 'Description4', 25.00, 35.00),
(5, 'Product5', 'Description5', 30.00, 40.00),
(6, 'Product6', 'Description6', 35.00, 45.00),
(7, 'Product7', 'Description7', 40.00, 50.00),
(8, 'Product8', 'Description8', 45.00, 55.00),
(9, 'Product9', 'Description9', 50.00, 60.00),
(10, 'Product10', 'Description10', 55.00, 65.00);


-- Insert into Supplier
INSERT INTO Supplier (SupplierID, SupplierName, ContactPerson, ContactPhone, Address) VALUES
(1, 'Supplier1', 'Contact1', '1234567890', 'Address1'),
(2, 'Supplier2', 'Contact2', '1234567890', 'Address2'),
(3, 'Supplier3', 'Contact3', '1234567890', 'Address3'),
(4, 'Supplier4', 'Contact4', '1234567890', 'Address4'),
(5, 'Supplier5', 'Contact5', '1234567890', 'Address5'),
(6, 'Supplier6', 'Contact6', '1234567890', 'Address6'),
(7, 'Supplier7', 'Contact7', '1234567890', 'Address7'),
(8, 'Supplier8', 'Contact8', '1234567890', 'Address8'),
(9, 'Supplier9', 'Contact9', '1234567890', 'Address9'),
(10, 'Supplier10', 'Contact10', '1234567890', 'Address10');


-- Insert into Customer
INSERT INTO Customer (CustomerID, CustomerName, ContactPerson, ContactPhone, Address) VALUES
(1, 'Customer1', 'Contact1', '1234567890', 'Address1'),
(2, 'Customer2', 'Contact2', '1234567890', 'Address2'),
(3, 'Customer3', 'Contact3', '1234567890', 'Address3'),
(4, 'Customer4', 'Contact4', '1234567890', 'Address4'),
(5, 'Customer5', 'Contact5', '1234567890', 'Address5'),
(6, 'Customer6', 'Contact6', '1234567890', 'Address6'),
(7, 'Customer7', 'Contact7', '1234567890', 'Address7'),
(8, 'Customer8', 'Contact8', '1234567890', 'Address8'),
(9, 'Customer9', 'Contact9', '1234567890', 'Address9'),
(10, 'Customer10', 'Contact10', '1234567890', 'Address10');


-- Insert into Inventory
INSERT INTO Inventory (InventoryID, ProductID, Quantity) VALUES
(1, 1, 100),
(2, 2, 200),
(3, 3, 300),
(4, 4, 400),
(5, 5, 500),
(6, 6, 600),
(7, 7, 700),
(8, 8, 800),
(9, 9, 900),
(10, 10, 1000);


-- Insert into PurchaseOrder
INSERT INTO PurchaseOrder (PurchaseOrderID, SupplierID, OrderDate) VALUES
(1, 1, '2023-01-01'),
(2, 2, '2023-02-01'),
(3, 3, '2023-03-01'),
(4, 4, '2023-04-01'),
(5, 5, '2023-05-01'),
(6, 6, '2023-06-01'),
(7, 7, '2023-07-01'),
(8, 8, '2023-08-01'),
(9, 9, '2023-09-01'),
(10, 10, '2023-10-01');


-- Insert into SalesOrder
INSERT INTO SalesOrder (SalesOrderID, CustomerID, OrderDate) VALUES
(1, 1, '2023-01-01'),
(2, 2, '2023-02-01'),
(3, 3, '2023-03-01'),
(4, 4, '2023-04-01'),
(5, 5, '2023-05-01'),
(6, 6, '2023-06-01'),
(7, 7, '2023-07-01'),
(8, 8, '2023-08-01'),
(9, 9, '2023-09-01'),
(10, 10, '2023-10-01');


-- Insert into PurchaseOrderDetail
INSERT INTO PurchaseOrderDetail (PurchaseOrderDetailID, PurchaseOrderID, ProductID, Quantity, Price) VALUES
(1, 1, 1, 10, 10.00),
(2, 2, 2, 20, 15.00),
(3, 3, 3, 30, 20.00),
(4, 4, 4, 40, 25.00),
(5, 5, 5, 50, 30.00),
(6, 6, 6, 60, 35.00),
(7, 7, 7, 70, 40.00),
(8, 8, 8, 80, 45.00),
(9, 9, 9, 90, 50.00),
(10, 10, 10, 100, 55.00);


-- Insert into SalesOrderDetail
INSERT INTO SalesOrderDetail (SalesOrderDetailID, SalesOrderID, ProductID, Quantity, Price) VALUES
(1, 1, 1, 10, 20.00),
(2, 2, 2, 20, 25.00),
(3, 3, 3, 30, 30.00),
(4, 4, 4, 40, 35.00),
(5, 5, 5, 50, 40.00),
(6, 6, 6, 60, 45.00),
(7, 7, 7, 70, 50.00),
(8, 8, 8, 80, 55.00),
(9, 9, 9, 90, 60.00),
(10, 10, 10, 100, 65.00);           

四、測試環境搭建

1、資料庫采用本地搭建的mysql 8.0 ;

2、大語言模型采用OpenAI,GPT3 模型;

3、LangChain 0.0.191;

3、資料用ChatGpt自動構造的測試資料,故不存在資料安全和合規問題;

五、實驗測試過程

本次實驗測試過程取老闆在日常中比較常用的場景進行測試,比如查供應商、商品、庫存、銷量、營收和利潤等。

場景1 :

輸入:我們總共有多少個供應商,給我們供應了多少種商品?

生成sql:

SELECT COUNT(DISTINCT SupplierID) AS 'Number of Suppliers', COUNT(DISTINCT ProductID) AS 'Number of Products' FROM Supplier, Product           
輸出:Final Answer: There are 10 suppliers and 10 products.           

結論:答案正确✅

引擎分析的過程如下:

讓老闆成為資料分析師--ChatGpt連結本地資料源實戰測試

場景2 :

輸入:每一種商品的庫存有多少?

生成的sql:

SELECT ProductID, Quantity FROM Inventory           

輸出:Final Answer: The quantity of each product in the Inventory table is: ProductID 1 has 100, ProductID 2 has 200, ProductID 3 has 300, ProductID 4 has 400, ProductID 5 has 500, ProductID 6 has 600, ProductID 7 has 700, ProductID 8 has 800, ProductID 9 has 900, ProductID 10 has 1000.

結論:答案正确✅

引擎分析的過程如下:

讓老闆成為資料分析師--ChatGpt連結本地資料源實戰測試

場景3 :

輸入:按月給出所有商品的銷量和庫存?

生成的sql:

SELECT MONTH(sod.SalesOrderID) AS Month, p.ProductName, SUM(sod.Quantity) AS Sales, i.Quantity AS Inventory FROM SalesOrderDetail sod INNERJOIN Product p ON sod.ProductID = p.ProductID INNER JOIN Inventory i ON p.ProductID = i.ProductID GROUP BY Month, p.ProductName, i.InventoryID ORDER BY Month ASC LIMIT 10            

輸出:Final Answer: The sales and inventory for each product by month are: Product1 (10 sales, 100 inventory), Product2 (20 sales, 200 inventory), Product3 (30 sales, 300 inventory), Product4 (40 sales, 400 inventory), Product5 (50 sales, 500 inventory), Product6 (60 sales, 600 inventory), Product7 (70 sales, 700 inventory), Product8 (80 sales, 800 inventory), Product9 (90 sales, 900 inventory), Product10 (100 sales, 1000 inventory).

結論:月度日期不對,部分正确

引擎分析的過程如下:

讓老闆成為資料分析師--ChatGpt連結本地資料源實戰測試
讓老闆成為資料分析師--ChatGpt連結本地資料源實戰測試

場景4 :

輸入:給出每個月的銷量和銷售額?

生成的sql:

"SELECT MONTH(OrderDate) AS Month, SUM(Quantity) AS SalesQuantity, SUM(Quantity * Price) AS SalesAmount FROM SalesOrder JOIN SalesOrderDetail ON SalesOrder.SalesOrderID = SalesOrderDetail.SalesOrderID GROUP BY MONTH(OrderDate) ORDER BY Month ASC LIMIT 10"           

結論:答案正确✅

引擎分析的過程如下:

讓老闆成為資料分析師--ChatGpt連結本地資料源實戰測試

場景5 :

輸入:給出每個月的銷量,營收和利潤?

生成的sql:

SELECT MONTH(OrderDate) AS Month, SUM(Quantity) AS Sales, SUM(Quantity * Price) AS Revenue, SUM(Quantity * Price) - SUM(Quantity * CostPrice) AS Profit FROM SalesOrder JOIN SalesOrderDetail ON SalesOrder.SalesOrderID = SalesOrderDetail.SalesOrderID JOIN ProductON SalesOrderDetail.ProductID = Product.ProductID GROUP BY MONTH(OrderDate) ORDER BY Month ASC LIMIT 10 
           

輸出:Final Answer: The sales, revenue, and profit for each month are: Month 1: Sales 10, Revenue 200.00, Profit 100.00; Month 2: Sales 20, Revenue 500.00, Profit 200.00; Month 3: Sales 30, Revenue 900.00, Profit 300.00; Month 4: Sales 40, Revenue 1400.00, Profit 400.00; Month 5: Sales 50, Revenue 2000.00, Profit 500.00; Month 6: Sales 60, Revenue 2700.00, Profit 600.00; Month 7: Sales 70, Revenue 3500.00, Profit 700.00; Month 8: Sales 80, Revenue 4400.00, Profit 800.00; Month 9: Sales 90, Revenue 5400.00, Profit 900.00; Month 10: Sales 100, Revenue 6500.00, Profit 1000.00.

結論:答案正确✅

引擎分析過程:

讓老闆成為資料分析師--ChatGpt連結本地資料源實戰測試
讓老闆成為資料分析師--ChatGpt連結本地資料源實戰測試

六、總結

LangChain能夠連結本地資料源到大語言模型,并且利用大語言模型完成指定任務,準确率較高,通過引擎的分析思考過程來看,ChatGpt的思維邏輯類似人類,也會寫錯sql報錯,然後不停重試直至得到最終結果,但仍有一些風險和改進點:

1、資料安全和合規風險,這裡直接連結了本地資料庫,我這裡是構造的測試資料,真正的業務場景不需要,可以等通義千問的開放。

2、由于ChatGptd的API的輸入輸出有4100個Tokens的限制,針對複雜任務,容易超過限制被終端,是以不能一次輸入過多的問題,這個可以通過prompt template來設定一些常用模闆來解決。

3、這次實驗最終的輸出是英文,這個可以設定引擎的語言參數以及輸出結果進行結構化傳回,前端頁面實作可視化圖形展現,這次主要目的為了測試SQL生成能力,展示就不贅述。

繼續閱讀